9

is java's hashCode() deterministic?

I try to implement a document search engine that uses the minhashing algorithm and I use hashCode to pre-hash words. Is the same word going to get the same hash every time that I run it?

Is it going to get the same hash even if I run it from a different machine (32 bit vs 64bit)?

etzourid
  • 331
  • 4
  • 18
  • 1
    I won't bet on that... It could even happen that the hash could be related to the address of the object, and then it could change even from one run to the next one... – Basile Starynkevitch May 08 '13 at 15:58
  • See http://stackoverflow.com/questions/1516843/java-object-hashcode-result-constant-across-all-jvms-systems – Annabelle May 08 '13 at 15:58
  • Why not ask a friend to run a sample piece of code and see? Why not post the said small piece of code so we can all do it? :) That being said, I *don't think* that hashCode is consistent between multiple runs, just for that one stay in the VM. – Shark May 08 '13 at 15:59
  • why not use a different hashing algorithm like md5, say – SoWhat May 08 '13 at 16:07
  • What would you hash with MD5? Maybe it's clear to you, but I will explain for the OP: you cannot expand the size of the hash space by applying a deterministic transformation like MD5: if you have only five initial integer values, you will end with only five different MD5 hashes (and increased memory usage). – Stefano Sanfilippo May 08 '13 at 16:14
  • The answer is NO. Object.hashCode() depends on memory location, this changes with each run of the JVM. – Eric Lindauer May 10 '13 at 22:29

3 Answers3

12

It depends on the class you are referring to. Base Object.hashCode implementation is not, since, as stated in the documentation:

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

Addresses are not deterministic, consider that sometimes they are even used as a source of entropy.

But, for instance, String has a deterministic hash code determined as follows:

Formula from Wikpedia

(image taken from Wikipedia)

In some cases there is not even a sensible deterministic definition for the hash code.

Stefano Sanfilippo
  • 32,265
  • 7
  • 79
  • 80
  • +1 but you should use [the javadoc](http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#hashCode%28%29) as a reference rather than Wikipedia. – assylias May 08 '13 at 16:06
  • 2
    I only stated that the formula image was copied from Wikipedia, not that I used that as a reference. Clarified. – Stefano Sanfilippo May 08 '13 at 16:07
4

The general contract of hashCode is as Javadoc says :

Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.

Is the same word going to get the same hash every time that I run it?

During the execution of the application, invoking hashCode() on equal words (I assume the word is a String instance and equals() has been overridden in String) should return the same integer.

EDIT Since the javadoc for String.hashCode() specifies how a String's hash code is computed, it is deterministic.

Returns a hash code for this string. The hash code for a String object is 
computed as :
 s[0]*31^(n-1) + s1*31^(n-2) + ... + s[n-1]
AllTooSir
  • 48,828
  • 16
  • 130
  • 164
  • 4
    Your answer is confusing. `hashcode` is well defined and deterministic for Strings, whether the machine is 32 or 64 bit – assylias May 08 '13 at 16:07
  • 1
    @assylias Yes, which can actually be a DoS risk! An attacker can construct an HTTP request with a bunch of strings (env vars and query params) intentionally designed to have the same hash value, turning an ~O(1) hash map effectively into an O(N) linked list. Womp womp. – yshavit May 08 '13 at 16:12
  • 3
    Also, note that there are other classes that do or can have deterministic hashing. For instance, the `List` interface defines its hash based on its elements, so if all of its elements have deterministic hashing (e.g. they're all `String`s), then the `List` will have a deterministic hash, too. – yshavit May 08 '13 at 16:15
3

Speaking of objects in general: it doesn't.

However if you are talking specificially about String, then the hashcode calculation is explicitly specified in the API of String.hashCode():

Returns a hash code for this string. The hash code for a String object is computed as

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)

In other words: you should be able to depend on the hashCode being stable for strings.

Community
  • 1
  • 1
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197