I'm using Intel's SHA hardware acceleration instructions (sha256rnds2
, etc, implementation here) and I have a speed around 30% slower than OpenSSL's software SHA256.
I'm doing a single SHA256 round (64 bytes), twice. As a comparision, I have around 100 M/s without SHA256 at all, 50 M/s with OpenSSL's SHA256 (two rounds of 64 bytes each) and 35 M/s using Intel's SHA instructions.
With 60 GHz (24 * 2.5 GHz [ * 2 HT]), that's around 600 cycles going to the two soft SHA256 rounds, while the same using accelerated instructions takes around 1100 cycles.
Is this expected?