How much faster is the native implementation of the native cryptographic hashes on Windows than the .Net Managed version?

Question

I'm providing hashes for sets of data in order to fingerprint the data and identify it by hash - this is the core use case for fast hashes like SHA1 and MD5.

In .Net, there is an option to go with the native or managed implementations of some of these hashes (the SHA variants, anyway). I'm looking for an MD5 managed implementation, and there doesn't appear to be one in the .Net Framework, but wondered if the wrapped native CSP is faster anyway, and if I should just use it content that there will be no perf problems using it. The top answer to Why is there no managed MD5 implementation in the .NET framework? indicates that faster performance could be the reason that a managed variant doesn't exist.

Is this true, and if so, how much faster is the native CSP?

The MS implementation of MD5 sucks, at least for short strings. I got a ~10 times speedup by p/invoking OpenSSL. — CodesInChaos, Feb 13 '13 at 10:05
I'm glad to hear this, since I was interested in testing the OpenSSL implementation. — codekaizen, Feb 13 '13 at 10:07
Small advertisement: If you want a collision resistant(neither MD5 nor SHA-1 is) fast hash-function to identify files, then you could consider Blake2. It was designed for that scenario. But you'll need a native library for maximal performance. — CodesInChaos, Feb 13 '13 at 10:27
Cool, I'll take a look. I use Murmur3 for tracking the state of some internal data, but Blake2 looks interesting, especially given the published advisories. — codekaizen, Feb 13 '13 at 10:30

score 21 · Accepted Answer · edited May 23 '17 at 12:26

Unfortunately, the wrapped native CSP for MD5 - MD5CryptoServiceProvider - is significantly slower than a pure managed implementation. It is an obstinate viewpoint that holds that native code is unequivocally faster than managed code: in many cases the opposite is true. This is such a case, at least in head-to-head measurements.

Using the translated reference MD5 implementation by David Anson, I constructed a quick performance test (source) which aims to measure any large differences in performance between the two implementations. While for small data arrays the difference are negligible, as expected, at around 16kB the native implementation starts to show potentially significant delay - on the order of milliseconds. This might not seem like much, but it is orders of magnitude slower than the pure managed implementation. This difference is maintained as the size of the data being hashed increases, and at the largest tested data array - ~250MB - the difference in CPU time was about 8.5 seconds. Considering that a hash like this is often used to fingerprint very large files, this extra delay would become noticeable, even against the often much larger delays from I/O.

It's not abundantly clear where the delay comes from, since a pure native test was not performed (one which would dispense with the wrapping of a CSP and consumption in managed code), but given the nearly identical shape of the graphs on the log scale, it would appear that the managed and native implementations have the same intrinsic performance, but that the native code performance is "shifted" down in performance likely due to the cost of the interop between native and managed code at runtime. This performance difference between wrapped native CSPs and pure managed implementations has also been reproduced and documented by other investigators.

In addition to answering the question "how much faster is the native implementation" in this particular case, I hope this evidence serves to prompt more reflection and investigation when the question of native vs. managed arises, breaking the long-standing and pernicious reaction to similar questions that native code is always faster, and thus, somehow, better. Managed code is clearly very fast, even in this performance-sensitive domain of bulk data hashing.

MD5 Hash Computation Time MD5 Hash Computation Time (Logarithmic)

`"shifted" down` but in logarithmic space which corresponds to a constant factor speed difference. So it's probably a bad native implementation. If done right native should beat managed measurable on big data. — usr, Dec 06 '15 at 23:33
Yes, that was the point I'd made: "it would appear that the managed and native implementations have the same intrinsic performance". But why would native code have more performance? If you profile what the jitter outputs and avoid GC collection pauses, you'll get about the same performance unless you specifically target hardware instructions / registers that the CLR jitter doesn't. Managed code isn't just slower for magic reasons, there are specific things that slow it down that can be avoided for perf sensitive code. — codekaizen, Dec 07 '15 at 00:31
Well, I disagree that managed and native have the same intrinsic perf and the data is evidence for that. The data is evidence *against* PInvoke overhead at high data sizes because the frequency of PInvoke calls should approach zero. The .NET JIT is quite poor in general. A good C compiler can outdo it easily. I have investigated managed code gen in detail and it disappoints in many respects. As a practical example the managed JIT only recently gained support for rotation instructions (this is not released yet). — usr, Dec 07 '15 at 00:36
Yes, the old jitter is quite poor. RyuJit is already better and having the source open should help it a good deal (hopefully). And, yes, it's not hard to do better when we put some effort into optimization of the native generated code. But my point is this isn't magical. It's not like just because the code is native it is somehow always faster and better, which is a belief I find when talking with other developers. Both can be fast, and speed takes work. Yes managed code starts at a perf disadvantage, and the jitter caps it, but it can ultimately be made fast. — codekaizen, Dec 07 '15 at 00:46
I compared old JIT and RyuJIT in detail in like 50 tests and compared the code. RyuJIT is worse across the board except for very few places. Range checks in loops are the biggest improvement.; Of course, the JIT can do equal or better. The Hotspot JVM is amazing. The .NET JIT is poor and this seems to be as designed... :( — usr, Dec 07 '15 at 13:31

How much faster is the native implementation of the native cryptographic hashes on Windows than the .Net Managed version?

1 Answers1

Linked