30

An example, in x86 are Instruction Set to hardware acceleration AES. But are there in x86 any instructions to accelerate SHA (SHA1/2/256/512) encoding, and what library is the fastet to encoding SHA on x86?

Alex
  • 12,578
  • 15
  • 99
  • 195
  • 7
    Read that http://software.intel.com/en-us/articles/improving-the-performance-of-the-secure-hash-algorithm-1 – alexbuisson Dec 19 '13 at 22:10
  • 2
    Whoever voted to close this must surely have done it by mistake? – TonyK Dec 19 '13 at 22:30
  • @TonyK Probably due to the question asking for *The* fastest library, which is likely to attract opinionated responses. That part of the question is unanswerable. – IInspectable Dec 20 '13 at 00:15
  • 4
    There will be such instructions, called [Intel SHA Extensions](http://software.intel.com/en-us/articles/intel-sha-extensions) in the upcoming [Skylake](http://en.wikipedia.org/wiki/Skylake_(microarchitecture)) architecture. – CodesInChaos Dec 20 '13 at 10:37
  • 1
    Making SHA slow is a feature, much harder to crack it that way. Inevitably we'll just end up with more bits. – Hans Passant Dec 20 '13 at 17:26
  • 11
    @HansPassant That's a very uninformed remark, slowness is certainly not a good feature for a secure hash algorithm. The SHA-3 candidates were certainly chosen for both security and speed (and difference in architecture from SHA-2, in the end) but will have exactly the same number of output bits as SHA-2. Slowness can be of use for certain algorithms that *use* secure hashes, like PBKDF's where it is used for key strengthening.. – Maarten Bodewes Dec 20 '13 at 23:27

4 Answers4

24

Intel has upcoming instructions for accelerating the calculation of SHA1 /256 hashes.

enter image description here

You can read more about them, how to detect if your CPU support them and how to use them here.

(But not SHA-512, you'll still need to manually vectorize that with regular SIMD instructions. AVX512 should help for SHA-512 (and for SHA-1 / SHA-256 on CPUs with AVX512 but not SHA extensions), providing SIMD rotates as well as shifts, for example https://github.com/minio/sha256-simd)

It was hoped that Intel's Skylake microarchitecture would have them, but it doesn't. Intel CPU's with it are low-power Goldmont in 2016, then Goldmont Plus in 2017. Intel's first mainstream CPU with SHA extensions will be Cannon Lake. Skylake / Kaby Lake / Coffee Lake do not.

AMD Ryzen (2017) has SHA extension.

A C/C++ programmer is probably best off using OpenSSL, which will use whatever CPU features it can to hash quickly. (Including SHA extensions on CPUs that have them, if your version of OpenSSL is new enough.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
voidlogic
  • 6,398
  • 2
  • 23
  • 21
  • 3
    But note that these instructions are being added to the Skylake microarchitecture, which is not expected to be released until perhaps 2015 or 2016. –  Feb 03 '14 at 18:00
  • 5
    Now that SHA-1 is being phased out (http://googleonlinesecurity.blogspot.com/2014/09/gradually-sunsetting-sha-1.html) Intel's new instructions are kinda pointless... – Dima Tisnek Dec 09 '14 at 20:39
  • SHA-1 is still baked permanently into systems like Git version control – Nayuki Feb 02 '17 at 00:00
  • SHA-1 is still part of the boot process (Many files in Windows are only signed and checked with SHA-1 for Authenticode, not SHA-256, during the secure boot process). This is stil ltrue with most recents versions of Windows 10 (including Insider's, and Enterprise versions) Accelerating it could boost the boot – verdy_p Nov 15 '19 at 01:53
  • Also SHA-3 is not supported for now in TPM 2.0 for UEFI boot. SHA-2 is not supported for TPM 1.2 (incompatible with new requirements for UEFI secure boot) which can only use SHA-1 for the TCL measument logs (but still used in aggregate containers that must be then resigned with SHA-2) – verdy_p Nov 15 '19 at 01:57
14

Are there in x86 any instructions to accelerate SHA (SHA1/2/256/512) encoding?

It's November 2016 and the answer is finally Yes. But its only SHA-1 and SHA-256 (and by extension, SHA-224).

Intel CPUs with SHA extensions hit the market recently. It looks like processors which support it are Goldmont microarchitecture:

  • Pentium J4205 (desktop)
  • Pentium N4200 (mobile)
  • Celeron J3455 (desktop)
  • Celeron J3355 (desktop)
  • Celeron N3450 (mobile)
  • Celeron N3350 (mobile)

I looked through offerings at Amazon for machines with the architecture or the processor numbers, but I did not find any available (yet). I believe HP Acer had one laptop with Pentium N4200 expected to be available in November 2016 December 2016 that would meet testing needs.

For some of the technical details why it's only SHA-1, SHA-224 and SHA-256, then see crypto: arm64/sha256 - add support for SHA256 using NEON instructions on the kernel crypto mailing list. The short answer is, above SHA-256, things are not easily parallelizable.


You can find source code for both Intel SHA intrinsics and ARMv8 SHA intrinsics at Noloader GitHub | SHA-Intrinsics. They are C source files, and provide the compress function for SHA-1, SHA-224 and SHA-256. The intrinsic-based implementations increase throughput approximately 3× to 4× for SHA-1, and approximately 6× to 12× for SHA-224 and SHA-256.

Nayuki
  • 17,911
  • 6
  • 53
  • 80
jww
  • 97,681
  • 90
  • 411
  • 885
  • 1
    I looked for discussion of SHA-512 in that link, but a text search for "512" didn't find anything. Which part of the post were you talking about? And is what you can do with software and normal scalar or SIMD integer instructions (i.e. most or all of this patch) relevant to what Intel could implement in hardware? Or is the fact that vector registers aren't yet 512b wide relevant? (I don't really know how SHA works.) – Peter Cordes Nov 03 '16 at 13:06
  • @Peter - Check Andy Polyakov's comments. He does a much better job at explaining it than I would para-phrasing it. – jww Nov 03 '16 at 15:40
  • 1
    Support is implemented in OpenSSL: [see crypto/sha/asm/sha1-x86_64.pl](https://github.com/openssl/openssl/blame/5071824/crypto/sha/asm/sha1-x86_64.pl#L388) – Janus Troelsen Jan 14 '17 at 10:14
  • For those asking, Andy Polyakov's comments is found at https://lore.kernel.org/linux-arm-kernel/a11d5dcc-1681-7e62-c3ac-f2fd53c50f14@openssl.org/ – Mingye Wang May 11 '21 at 12:18
11

2019 Update:

OpenSSL does use H/W acceleration when present.

On Intel's side Goldmont µarch has (Atom-series) and from Cannonlake (desktop/mobile, 10nm) onwards have SHA-NI support, Cascade Lake server CPUs and older do not support it. Yes, support is non-linear on timeline due to parallel CPU/µarch lines present.

In 2017 AMD released their Zen µarch, so all current server and desktop CPUs based on Zen fully support it.


My benchmark of OpenSSL speed SHA256 showed a 550% speed increase with a block size of 8KiB.

For real 1GB and 5GB files loaded to RAM the hashing was roughly 3x times faster.

(Benchmarked on Ryzen 1700 @ 3.6 GHz, 2933CL16 RAM; OpenSSL: 1.0.1 no support vs 1.1.1 with support)


Absolute values for comparison against other hash functions:

sha1   (1.55GHz):  721,1 MiB/s
sha256 (1.55GHz):  668.8 MiB/s
sha1   (3.8GHz) : 1977,9 MiB/s
sha256 (3.8GHz) : 1857,7 MiB/s

See this for details until there's a way to add tables on SO.


CPUID identification, page 298: 07h in EAX → EBX Bit 29 == 1.

Intel's Instruction Set Reference, page 1264ff.

Agner Fog's Instruction tables where he benchmarks instruction latency/µops etc. (currently Zen, Goldmont, Goldmont Plus available)

Code example, SIMD comparison: minio/sha256-simd

BotOfWar
  • 588
  • 5
  • 14
  • 2
    There was one laptop cannonlake chip released, i3-8121U, which [according to WikiChip](https://en.wikichip.org/wiki/intel/core_i3/i3-8121u) definitely does have the SHA extension. http://users.atw.hu/instlatx64/GenuineIntel0060663_CannonLake_InstLatX64.txt confirms it, too, with decoded CPUID results and perf numbers for the SHA instructions (7c latency / 3c tput for SHA1RNDS4). Ice Lake (https://en.wikichip.org/wiki/intel/microarchitectures/sunny_cove) is still expected to continue to support SHA extensions; wikichip just mentions instructions added vs. the previous uarch. – Peter Cordes May 02 '19 at 21:25
  • @Peter I was unable to find a mention of SHA-NI support anywhere on Intel Ark/docs or find a spec sheet for 2nd Gen Xeon Scalable. I found [this document](https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf), p.15 (dated Apr 4th) only stating "TBD / Cannon Lake and later" and [Intel QuickAssist](https://software.intel.com/en-us/articles/intel-xeon-processor-scalable-family-technical-overview) where some functions can be offloaded to C620-series chipset. Reached out to Intel support for clarification. – BotOfWar May 14 '19 at 01:32
  • Oops, Wikipedia is probably wrong about Cascade Lake. https://en.wikichip.org/wiki/intel/xeon_platinum/8260 doesn't list SHA for that CSLX Xeon, and neither does https://ark.intel.com/content/www/us/en/ark/products/192474/intel-xeon-platinum-8260-processor-35-75m-cache-2-40-ghz.html. (But Ark doesn't list it as a "no" either.) InstLatx64 doesn't have a CPUID dump for any CSLX CPUs yet, either, and I didn't find anything else online. [Wikichip's general page for CSL](https://en.wikichip.org/wiki/intel/microarchitectures/cascade_lake) doesn't mention SHA. – Peter Cordes May 14 '19 at 04:09
  • 1
    @Peter Intel supp confirmed sha-ni not supported in xeon scalable that are based on Cascade Lake quoting the doc I linked: https://twitter.com/IntelSupport/status/1128705678909083654?s=09 – BotOfWar May 15 '19 at 19:11
  • Ok that's weird. Cannon Lake came out (i3-8121u) *before* Cascade Lake, so "Cannon Lake and later" is hardly unambiguous or clear here! And I think the current plans are not to release any more CNL CPUs, just straight to Ice Lake (Sunny Cove microarch). I see someone's already corrected Wikipedia: https://en.wikipedia.org/wiki/Cascade_Lake_(microarchitecture). – Peter Cordes May 16 '19 at 07:00
-1

Try something open source such as OpenSSL I have personally used their MD5 hashing functions and those worked pretty well. You might also want to take a look at hashlib2++.

As far as I know Intel hasn't made dedicated instruction set for SHA-1 or two. They may in upcoming architectures as CodesInChaos indicated in a comment. The major component in most hashing algorithms is the XOR operation which is already in the instruction set.

Maarten Bodewes
  • 90,524
  • 13
  • 150
  • 263
krish
  • 104
  • 1
  • 13
  • Are these libraries faster than implementation which introduced by Intel? Link which alexbuisson gave: http://software.intel.com/en-us/articles/improving-the-performance-of-the-secure-hash-algorithm-1 – Alex Dec 20 '13 at 22:43
  • I haven't personally used the intel's one .Let me see what i can find. – krish Dec 21 '13 at 15:57