Interesting, so you're suggesting to actually disable the HW acceleration and test without it? I was curious about that myself. If that is the case, I will make a new .config from scratch to eliminate any inter-dependencies that have crept in to try a clean build without HW accel. Yes, my goal is purely speed, not offloading work.
WRT1900ACS without CESA, aes-256-cbc
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256 cbc 32470.09k 34938.86k 35737.09k 35967.66k 36088.64k
WRT1900ACS with CESA, aes-256-cbc, command line 'openssl speed -elapsed -evp aes-256-cbc'
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 896.45k 3902.31k 14125.06k 41816.41k 79828.31k
Summary: software significantly faster than hardware unless encoding large chunks of data at a time. Speed is VERY use case dependent.