0

Intel processors allow to calculate SHA512 faster because of SIMD optimizations they have. I want to take advantage of it in Ruby. However, implementation of SHA512 doesn't use SIMD https://github.com/ruby/ruby/blob/trunk/ext/digest/sha2/sha2.c does it?

Is there any way to unitize SIMD optimizations for calculating SHA512 in Ruby?

  • 1
    You could write one in C. How man hashes are you intending to do here that this would matter? – tadman May 18 '16 at 19:16
  • 2
    Reference code: http://www.intel.co.uk/content/dam/www/public/us/en/documents/white-papers/fast-sha512-implementations-ia-processors-paper.pdf - as far as I can see the Ruby library doesn't do it, but if it were to try this, it would need to be designed to work on a range of processors - it would be a lot of work to add the necessary conditionals, detectors for various "levels" of SIMD available etc. – Neil Slater May 18 '16 at 19:18
  • That code you linked to isn’t necessarily used for the SHA implementation in the `digest` module. When Ruby is built it will check for OpenSSL (and Apple’s CommonCrypto) and use one of those implementations if available. – matt May 18 '16 at 20:38
  • @tadman, 8 hashes. and 8 more. and 8 more. –  May 18 '16 at 22:25
  • @Dimon Give me a number here. Billions? Quadrillions? – tadman May 19 '16 at 03:29
  • @tadman, millions. –  May 19 '16 at 05:23
  • You don't need SIMD for millions. Amend your question with performance requirements and where you're at using the existing routines. – tadman May 19 '16 at 05:24
  • @NeilSlater, why? 32 and x64, that's it. –  May 19 '16 at 05:24
  • @tadman, the question isn't whether I need it or not, but whether it exists or not, and if yes, where. –  May 19 '16 at 05:25
  • @tadman, or rather, the current implementation of SHA512 is slow for my needs, I need a faster one. –  May 19 '16 at 05:35
  • Please, speak in terms of specifics. Can you produce a simple Ruby [Benchmark test](http://ruby-doc.org/stdlib-2.3.0/libdoc/benchmark/rdoc/Benchmark.html) that shows what your performance is right now and what level is acceptable for solving your problem? "Too slow" is a meaningless measure. If you need to write a faster one, you'll need this benchmark to prove that your work is actually doing something useful and not regressing in terms of performance. – tadman May 19 '16 at 05:50
  • 2
    Stay civil Dimon. I'm asking a reasonable question here. If you insist on being difficult, I'll ask you a simple question, the Stack Overflow Golden Rule: **WHAT HAVE YOU TRIED**? – tadman May 19 '16 at 06:08
  • 1
    @Dimon: There are 5 "levels" of SIMD available, depending on processor: None, SSE, AVX, AVX-2 and AVX-512. The source file would only have to support 2 (one of which was "None") and switch between them at compile time. But still it is a lot of effort, you tend to find only specialist speed-is-everything libraries going to that extent (e.g. ffmpeg does this). I looked at the source for OpenSSL and instead those developers chose to use inline assembly for some parts - you might want to check the speed of that, it could be enough for you. – Neil Slater May 19 '16 at 07:02
  • 1
    @NeilSlater: There are several levels of [SSE](http://stackoverflow.com/tags/sse/info). SSE2 is baseline for x86-64. Anyway, the [Intel SHA extensions](http://stackoverflow.com/a/21533954/224132) aren't part of AVX or AVX2; they have a separate feature-bit, like the AES-acceleration instructions. – Peter Cordes May 19 '16 at 07:31
  • @PeterCordes, I don't see an issue because I need them for x86-64 which is SSE2 –  May 19 '16 at 17:55

1 Answers1

5

Intel published a paper on SIMD-accelerating SHA512, in Nov 2012.

They say they got ~8.59 cycles/byte for their AVX version, on a Sandybridge i7 2600. They didn't publish results for their AVX2 / rorx (BMI2) version, since Haswell wasn't released yet. I didn't follow the links to the source code; presumably it's C with intrinsics.

To implement it in Ruby's source code, you'll need to handle the case where ruby is running on a CPU that doesn't support the instruction set extensions your fast version uses, and fall back to a plain C or SSE2-only version.

Your best bet might be to have ruby use OpenSSL or a similar library to get hand-tuned versions of SHA-512 and many other functions. Crypto libraries already have with hand-tuned asm versions for many different platforms.


With Skylake (and Goldmont), Intel introduced new instructions to accelerate SHA-1 and SHA-256. Unfortunately, I don't see anything about being able to use those instructions for SHA-512.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • `Your best bet might be to have ruby use OpenSSL or a similar library` -- for example? –  May 19 '16 at 17:56
  • 1
    @Dimon I didn't have any specific alternatives to libopenssl in mind, but a quick google led me to [a decent article](http://www.interworx.com/community/reviewing-openssl-alternatives/) commenting on some. [GNUtls is the other major crypto library I've heard of](https://www.gnutls.org/manual/html_node/Cryptographic-API.html), with a different license and API. `gnutls_hash_fast` is probably the function you want. – Peter Cordes May 19 '16 at 18:38