38

I want to know what is the best and fastest implementation of hash algorithms for Java especially MD5 and SHA-2 512 (SHA512) or 256. I want a function to get a string as an argument and return the hash as the result. Thak you.

Edit: This is for getting mapping each URL to a unique hash. Since MD5 is not that reliable in this area, I'm more interested in finding the best & fastest implementation for SHA-2 algorithms. Note that I know even SHA-2 might produce the same hash for some URLs but I can live with that.

Alireza Noori
  • 14,961
  • 30
  • 95
  • 179
  • 1
    I would advise against using MD5... it is very broken. – Zach Langley Mar 27 '11 at 00:48
  • 6
    Is there any particular reason the implementations included with Java are not sufficient for your purpose? I can feed over 100MB/sec through `MessageDigest` using SHA-256 on my laptop. And the odds of even a random MD5 hash collision are such that if you had _billions_ of documents, you'd be more likely struck by a meteor _the exact second_ you read this. For unique identification MD5 is sufficient. It's only when it's used for security purposes and someone is explicitly attempting to cause a collision that you should be concerned (it would be extremely hard for a URL). – WhiteFang34 Mar 28 '11 at 05:07
  • You know, I want to find the BEST implementation out there. If there's a better one than the MessageDigest, why not use it? – Alireza Noori Mar 28 '11 at 09:49
  • 1
    Take a look at [Guava Hashing](https://code.google.com/p/guava-libraries/wiki/HashingExplained), it has a few hash utilities that might be useful. – Vitalii Fedorenko May 13 '15 at 15:02
  • 5
    FYI.. a quick speed comparison of the `MessageDigest` algorithms (left is faster): MD5 > SHA-1 > SHA-256 > MD2. (MD5 about twice as fast as SHA-256.... MD2 is 10x slower) – Pimp Trizkit Mar 02 '18 at 22:54
  • @PimpTrizkit Thank you – Alireza Noori Mar 05 '18 at 13:48

6 Answers6

53

First things first: speed is overrated. You should make measures before declaring that a given algorithm is "too slow". Most of the time, hash function speed makes no noticeable difference anyway. If you have qualms about security, then first select a hash function which is secure enough, and then only worry about performance.

Moreover, you want to hash "strings". A Java String is, internally, a chunk from an array of char values which represent Unicode code points (actually, Unicode 16-bit code units which encode the code points using UTF-16). A hash function takes as input a sequence of bits or bytes. So you will have to make a conversion step, e.g. str.getBytes("UTF-8"), to obtain your string as a bunch of bytes. It is likely that the conversion step will have a non-negligible cost when compared to the hashing itself.

Note: beware of URL-encoding ! In a URL, some bytes can be replaced with sequences beginning with a '%' sign; this is meant to support non-printable characters, but it can be used on "standard" characters as well (e.g., replacing 'a' with '%61'). This means that two strings which are distinct (in the String.equals() sense) may actually represent the same URL (as far as URL processing is concerned). Depending on your situation, this may or may not be an issue.

You should first try to use Java's MessageDigest API with the standard (already installed) JCE provider (i.e. you call MessageDigest.getInstance("SHA-256")), and bench the result. Theoretically, the JCE may map the call to an implementation with "native" code (written in C or assembly), which will be faster than what you can get with Java.

That being said...

sphlib is an opensource implementation of many cryptographic hash functions, in C and in Java. The code has been optimized for speed, and, in practice, the Java version turns out to be faster than what the standard JRE from Sun/Oracle offers. Use this link in case the previous link fails (the main host server is sometimes down for maintenance, as seems to be the case right now)(warning: 10 MB download). The archive also contains a report (which was presented at the second SHA-3 candidate conference in 2010) which gives some measured performance figures on several platforms, for SHA-2 and the 14 "second round" candidates for the upcoming SHA-3.

But you really should make in-situation benchmarks. For instance, effects on L1 cache can have a drastic effect on performance, and cannot be accurately predicted by taking the function code and running it in isolation.

Thomas Pornin
  • 72,986
  • 14
  • 147
  • 189
  • 1
    Thanks. This seems to be what I've been looking for. Very sophisticated answer. (Although I was already planning to do what you suggested but anyway it's good for the future readers). Very good point regarding the URL characters, I'll try to be conscious about it in the project. – Alireza Noori Mar 28 '11 at 13:48
21

Edit: I originally read the question as what's "the fastest hash algorithm" and it has been clarified to be "the fastest implementation of each algorithm". It's a valid question and others have pointed out faster implementations. However unless you're hashing large amounts of data in a short amount of time, it's simply not going to matter very much. I doubt it's usually worth the time and complexity to use something other than what's provided with the standard JCE.

For URL addresses you'd need to be hashing with SHA-256 upward of a million per second on modern hardware to require something faster. I can't imagine most applications needing more than a thousand per second (over 86 million per day), which means the overall CPU time spent hashing would be far less than 1%. So even if you had an infinitely fast hash algorithm you'd only be able to improve overall performance by 1% at best.

Original Answer: Getting both the best and fastest are at odds with each other. The better hashes are generally slower. If you really need speed and security isn't as much of a concern then use MD5. If you need the best security then go with SHA-256 or even SHA-512. You haven't mentioned what you're using it for so it's hard to recommend one or the other. You're probably safest going with SHA-256, as it should be fast enough for most use cases on modern hardware anyhow. Here's how you can do it:

String input = "your string";
MessageDigest digest = MessageDigest.getInstance("SHA-256");
digest.update(input.getBytes("UTF-8"));
byte[] hash = digest.digest();

If you're using this for security purposes, like hashing a password, then you should add salt to the digest as well. If you want a printable string out of the hash, you can encode it back to a string as hex:

static char[] HEX_CHARS = "0123456789ABCDEF".toCharArray();

StringBuilder sb = new StringBuilder(hash.length * 2);
for (byte b : hash) {
    sb.append(HEX_CHARS[(b & 0xF0) >> 4]);
    sb.append(HEX_CHARS[b & 0x0F]);
}
String hex = sb.toString();
WhiteFang34
  • 70,765
  • 18
  • 106
  • 111
  • I understood the question as "the fastest implementation of each algorithm", not the fastest algorithm. – Paŭlo Ebermann Mar 27 '11 at 00:47
  • But +1 for this nice "toHexString" implementation. – Paŭlo Ebermann Mar 27 '11 at 00:48
  • @Paŭlo: ah, you might be right. The OP should probably clarify and give us some insight into the use case. – WhiteFang34 Mar 27 '11 at 01:01
  • Yes, that is exactly what I meant: The fastest implementation for each algorithm. I think the fastest MD5 might be the "Fast MD5" but I've not tested it yet. I edited the question to clarify it more. And by the way, I love the string conversion part of your code. +1 – Alireza Noori Mar 28 '11 at 04:52
  • 1
    Since speed seems to be important, the `toHexString()` implementation could be slightly improved by replacing the hex digits `String` with a `char[]` and using `HEX_DIGITS[(b & 0xF0) >> 4]` instead of calling `charAt()`. This proved to be about 30% faster in a (probably flawed) microbenchmark. – Joachim Sauer Mar 28 '11 at 13:00
  • Thanks. Before your suggestion I used a private constant String in the class and it improved the performance but I will change it to char array. – Alireza Noori Mar 28 '11 at 20:47
  • Updated answer to use `char[]` for hex conversion as suggested for better performance. – WhiteFang34 Mar 28 '11 at 21:32
2

Consider BLAKE2 which is faster and more secure than the hashes mentioned above.

MD5, SHA-1, SHA256, and SHA-512 are susceptible to length-extension.

MD5 and SHA-1 are vulnerable to collisions.

MD5 is vulnerable to chosen-prefix collisions.

SHA-3 and BLAKE2 have no known security issues and can generate digests of varying length.

SHA-3 is fastest when implemented in hardware; BLAKE2 is fastest when using software implementations.

BLAKE2b is optimized for 64-bit platforms and produces digests of any size between 1 and 64 bytes.

BLAKE2s is optimized for 8 to 32-bit platforms and produces digests of any size between 1 and 32 bytes.

Here are benchmarks for AES, MD5, SHA-256, and BLAKE2b.

https://blake2.net/

https://www.cryptopp.com/benchmarks.html

In the first link, BLAKE2b (947 Mbits) is much faster than SHA-256 (413 Mbits) and MD5 (632 Mbits).

In the second link, AES-256 CBC (805 Mbits) and BLAKE2b (776 Mbits) are about equal in speed and faster then SHA-256 (275 Mbits) and MD5 (602) Mbits.

  • Why do you say that Blake is not "vulnerable to collisions"? By the pigeon hole principle, it is impossible for it to be invulnerable to collisions. – Jack G Jul 23 '20 at 20:34
2

Check these out: Lots of SHA / MD5 examples

Also: From same thread: Fast MD5

String hash = MD5.asHex(MD5.getHash(new File(filename)));

Community
  • 1
  • 1
AndyMac
  • 820
  • 6
  • 11
  • 1
    I've seen both before. Fast MD5 may be useful but I'm looking around for some more. And also as mentioned above, I'm looking more for fast SHA-2 implementations. – Alireza Noori Mar 27 '11 at 20:10
2

Another thing to consider is using MD4. It is not as safe as MD5, but is computed even faster. Windows up to XP used to store and exchange passwords in MD4, so we use this hash because it still allows us to provide authentication services to this platform.

Daniel
  • 27,718
  • 20
  • 89
  • 133
0

For a string, just call the hashCode() because is cheaper in memory overhead.

Otherwise I recommend this code for private hash:

public static int hash8(String val) throws UnsupportedEncodingException {
    return hash8(val.getBytes("UTF-8"));
}

public static int hash8(byte[] val) {
    int h = 1, i = 0;
    for (; i + 7 < val.length; i += 8) {
        h = 31 * 31 * 31 * 31 * 31 * 31 * 31 * 31 * h + 31 * 31 * 31 * 31
                * 31 * 31 * 31 * val[i] + 31 * 31 * 31 * 31 * 31 * 31
                * val[i + 1] + 31 * 31 * 31 * 31 * 31 * val[i + 2] + 31
                * 31 * 31 * 31 * val[i + 3] + 31 * 31 * 31 * val[i + 4]
                + 31 * 31 * val[i + 5] + 31 * val[i + 6] + val[i + 7];
    }
    for (; i + 3 < val.length; i += 4) {
        h = 31 * 31 * 31 * 31 * h + 31 * 31 * 31 * val[i] + 31 * 31
                * val[i + 1] + 31 * val[i + 2] + val[i + 3];
    }
    for (; i < val.length; i++) {
        h = 31 * h + val[i];
    }
    return h;
}

FYI: http://lemire.me/blog/2015/10/22/faster-hashing-without-effort/

Daniel De León
  • 13,196
  • 5
  • 87
  • 72
  • Can you please explain the above? – Pravat Panda Jun 21 '16 at 09:36
  • 4
    A reason to avoid using `hashCode()` is that it is not guaranteed to be consistent between JVMs (see [Object.hashCode](https://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode()) docs). If you are storing the hash anywhere or expect repeatable results use a standard hashing algorithm. – Danny G Mar 21 '17 at 19:23
  • @DannyG - Object.hashCode is not relevant if you're hashing Strings. – Doradus Jun 25 '18 at 20:21