I have a box built on the the Intel C2558 SoC, using a Supermicro A1SRi-2558F motherboard. It has 16GB of RAM installed. One of the selling points of this platform is its crypto hardware acceleration. However, even two years after I built the machine, support is still not included in any mainline kernel release. Quite disappointing to say the least.
Anyway, I finally got around to cross-compiling all the necessary drivers to get Quick Assist acceleration working.
It was pain in the a** to cross compile, since Intel loves to put .tar.gz files into zip files and use build shell scripts to compile that assume that host and target are the same. Just finding the software was an exercise in frustration. Oh, and then the documentation - a 60 page document spends 90% of the time telling you how to install Fedora rather than dealing with the actual software. I had to read the code to know how to use the software.
Anyway, rant over since I've got it working. It required 5 kernel modules, two of which were trivial since they're already in the kernel but not in OpenWrt and three of which were non-trivial and required patches to compile and function due to changes in the kernel over the lifetime of several releases.
I will, in due course, release a build for this platform along with the kernel modules and other associated packages. It will also work on the C2758. The kernel packages will also work on the DH89XX chipsets (not tested, but I can see this from the driver code), which populate some PCI acceleration boards from Intel.
The Intel drivers I ported consist of the quickassist driver, which supplies a kernel and user-space interface and a netkey shim, which provides kernel crypto acceleration (and works with strongswan and other ipsec implementations).
There are also some patches for zlib which supply compression acceleration (and also require a new kernel module), however after getting the patched version working, I discovered that the C2558 doesn't have compression acceleration, only crypto. The DH89xx does.
There appear to be some patches for openssl, but these require asynchronous operation and so need to be ported to openssl 1.1.0. I built openssl 1.1.0 for openwrt, but the new API breaks many packages, so it can't be used easily. I'll keep working on openssl acceleration, but it's not working at the moment.
The benchmarks below use the kernel crypto framework to do the transforms. Depending on keysize and algorithm, encrypt and decrypt performance for the transform range between 2Gbps - 4Gbps.
I did the benchmarks on my live router and happened to be browsing the net at the time. However, they're still pretty impressive for a box that probably runs at around 10w most of the time.
Loading AEAD Performance Test module ...
AEAD givencrypt performance: alg authenc(hmac(sha1),cbc(aes)) - keysize(128)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 99446457576
CPU frequency: 2399 MHz
Throughput: 3952 Mbps
-----------------------------------------------
AEAD decrypt performance: alg authenc(hmac(sha1),cbc(aes)) - keysize(128)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 106091253696
CPU frequency: 2399 MHz
Throughput: 3704 Mbps
-----------------------------------------------
AEAD givencrypt performance: alg authenc(hmac(sha256),cbc(aes)) - keysize(128)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 99558529560
CPU frequency: 2399 MHz
Throughput: 3947 Mbps
-----------------------------------------------
AEAD decrypt performance: alg authenc(hmac(sha256),cbc(aes)) - keysize(128)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 122812906752
CPU frequency: 2399 MHz
Throughput: 3200 Mbps
-----------------------------------------------
AEAD givencrypt performance: alg authenc(hmac(sha512),cbc(aes)) - keysize(128)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 134119874664
CPU frequency: 2399 MHz
Throughput: 2930 Mbps
-----------------------------------------------
AEAD decrypt performance: alg authenc(hmac(sha512),cbc(aes)) - keysize(128)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 155291391888
CPU frequency: 2399 MHz
Throughput: 2531 Mbps
-----------------------------------------------
AEAD givencrypt performance: alg authenc(hmac(md5),cbc(aes)) - keysize(128)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 100111379832
CPU frequency: 2399 MHz
Throughput: 3926 Mbps
-----------------------------------------------
AEAD decrypt performance: alg authenc(hmac(md5),cbc(aes)) - keysize(128)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 130504449000
CPU frequency: 2399 MHz
Throughput: 3011 Mbps
-----------------------------------------------
AEAD givencrypt performance: alg authenc(hmac(sha1),cbc(aes)) - keysize(256)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 117191387712
CPU frequency: 2399 MHz
Throughput: 3353 Mbps
-----------------------------------------------
AEAD decrypt performance: alg authenc(hmac(sha1),cbc(aes)) - keysize(256)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 107436501192
CPU frequency: 2399 MHz
Throughput: 3658 Mbps
-----------------------------------------------
AEAD givencrypt performance: alg authenc(hmac(sha256),cbc(aes)) - keysize(256)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 113961557256
CPU frequency: 2399 MHz
Throughput: 3448 Mbps
-----------------------------------------------
AEAD decrypt performance: alg authenc(hmac(sha256),cbc(aes)) - keysize(256)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 122816144352
CPU frequency: 2399 MHz
Throughput: 3200 Mbps
-----------------------------------------------
AEAD givencrypt performance: alg authenc(hmac(sha512),cbc(aes)) - keysize(256)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 134154360960
CPU frequency: 2399 MHz
Throughput: 2929 Mbps
-----------------------------------------------
AEAD decrypt performance: alg authenc(hmac(sha512),cbc(aes)) - keysize(256)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 160118259912
CPU frequency: 2399 MHz
Throughput: 2454 Mbps
-----------------------------------------------
AEAD givencrypt performance: alg authenc(hmac(md5),cbc(aes)) - keysize(256)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 115505439384
CPU frequency: 2399 MHz
Throughput: 3402 Mbps
-----------------------------------------------
AEAD decrypt performance: alg authenc(hmac(md5),cbc(aes)) - keysize(256)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 125164367592
CPU frequency: 2399 MHz
Throughput: 3140 Mbps
-----------------------------------------------
AEAD givencrypt performance: alg authenc(hmac(sha1),cbc(des3_ede)) - keysize(192)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 191244542784
CPU frequency: 2399 MHz
Throughput: 2055 Mbps
-----------------------------------------------
AEAD decrypt performance: alg authenc(hmac(sha1),cbc(des3_ede)) - keysize(192)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 154427025720
CPU frequency: 2399 MHz
Throughput: 2545 Mbps
-----------------------------------------------
AEAD givencrypt performance: alg authenc(hmac(sha256),cbc(des3_ede)) - keysize(192)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 201139307760
CPU frequency: 2399 MHz
Throughput: 1954 Mbps
-----------------------------------------------
AEAD decrypt performance: alg authenc(hmac(sha256),cbc(des3_ede)) - keysize(192)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 154425352248
CPU frequency: 2399 MHz
Throughput: 2545 Mbps
-----------------------------------------------
AEAD givencrypt performance: alg authenc(hmac(sha512),cbc(des3_ede)) - keysize(192)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 190371471624
CPU frequency: 2399 MHz
Throughput: 2064 Mbps
-----------------------------------------------
AEAD decrypt performance: alg authenc(hmac(sha512),cbc(des3_ede)) - keysize(192)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 156217053072
CPU frequency: 2399 MHz
Throughput: 2516 Mbps
-----------------------------------------------
AEAD givencrypt performance: alg authenc(hmac(md5),cbc(des3_ede)) - keysize(192)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 199082617056
CPU frequency: 2399 MHz
Throughput: 1974 Mbps
-----------------------------------------------
AEAD decrypt performance: alg authenc(hmac(md5),cbc(des3_ede)) - keysize(192)
-----------------------------------------------
Number threads: 4
Number of requests per thread: 5000000
Pkt Size: 1024
Total number of Cycles: 154426223232
CPU frequency: 2399 MHz
Throughput: 2545 Mbps
-----------------------------------------------