5

I was reading the JavaDoc for Object.hashCode method, it says that

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer [...])

But whatever its implementation is, hashCode method always returns a (let's assume positive) integer, so given Integer.MAX+1 different objects, two of them are going to have the same hashcode.

Why is the JavaDoc here "denying" collisions? Is it a practical conclusion given that internal address is used and "come on, you're never going to have Integer.MAX+1 objects in memory at once, so we can say it's practically always unique"?

EDIT

This bug entry (thank you Sleiman Jneidi) gives an exact idea of what I mean (it seems to be a more that 10 years old discussion):

appears that many, perhaps majority, of programmers take this to mean that the default implementation, and hence System.identityHashCode, will produce unique hashcodes.

The qualification "As much as is reasonably practical," is, in practice, insufficient to make clear that hashcodes are not, in practice, distinct.

Community
  • 1
  • 1
Luigi Cortese
  • 10,841
  • 6
  • 37
  • 48
  • 7
    You skipped the part "As much as is reasonably practical". Of course, if you have more than 2^32 objects, you are certain to have collisions. – JB Nizet Dec 17 '15 at 11:46
  • 4
    This, and you also skipped the part that if two objects are equal they must have the same hash code too – fge Dec 17 '15 at 11:48
  • @JBNizet so you're saying that my understanding is correct? – Luigi Cortese Dec 17 '15 at 11:49
  • 1
    @fge how is it related to this? If hashcode always returns "1", equal objects are having equal hascodes too – Luigi Cortese Dec 17 '15 at 11:50
  • hashCode can be negative, so Integer.MAX+1 objects won't necessary mean two with the same hashcode. I think this should be 2*Integer.MAX+2 (or just 2^32+1) objects (because abs(Integer.MIN) -1 = abs(IntegerMax)). But that's being painfully picky... – mixmastered Dec 17 '15 at 11:52
  • @mixmastered I assumed that just to keep it simple – Luigi Cortese Dec 17 '15 at 11:53
  • For me the JavaDoc just means the implementation does its best to avoid collisions "As much as is reasonably practical". It does not say they are impossible. – Henry Dec 17 '15 at 11:59
  • @LuigiCortese yes. as Henry says, the JVM does its best, but can't change the laws of mathematics. – JB Nizet Dec 17 '15 at 12:05

3 Answers3

3

The docs are misleading indeed, and there is a bug opened ages ago that says that the docs are misleading especially that the implementation is JVM dependent, and in-practice especially with massive heap sizes it is so likely to get collisions when mapping object identities to 32-bit integers

Sleiman Jneidi
  • 22,907
  • 14
  • 56
  • 77
  • 1
    Exactly! I mean, i would have written something like "pay attention! hashcode method can return the same values for different objects" instead of "yeah, it's practically unique" – Luigi Cortese Dec 17 '15 at 12:02
  • @LuigiCortese It does say that: "It is ***not* required** that if two objects are unequal according to the `equals(java.lang.Object)` method, then calling the `hashCode` method on each of the two objects must **produce distinct integer results**." – Andreas Dec 17 '15 at 12:12
  • @Andreas I misunderstood your senctence. What you cited says that it's not a constraint having distinct hashcodes for distinct objects, and it is not a constraint because it's impossible to satisfy it by using a finite (int) representation. Then, writing "*As much as is reasonably practical,[they are unique]*" it's like writing "*What I'm saying is not always true, but[they are unique]*". This is what I find wrong/misleading or whatever – Luigi Cortese Dec 17 '15 at 12:58
0

there is an interesting discussion of hashcode collisions here:

http://eclipsesource.com/blogs/2012/09/04/the-3-things-you-should-know-about-hashcode/

In particular, this highlights that your practical conclusion, "you're never going to have Integer.MAX+1 objects in memory at once, so we can say it's practically always unique" is a long way from accurate due to the birthday paradox.

The conclusion from the link is that, assuming a random distribution of hashCodes, we only need 77,163 objects before we have a 50/50 chance of hashCode collision.

mixmastered
  • 277
  • 1
  • 3
  • 1
    Object.hashCode does not produce a random distribution. The generated hashCodes are not random. The javadoc says: "s much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects". – JB Nizet Dec 17 '15 at 12:07
  • to me it sounds really wrong using "reasonably practical" in a programming language context, though – Luigi Cortese Dec 17 '15 at 12:12
  • @LuigiCortese Why? `hashCode` should generate well distributed numbers without sacrificing performance. An implementation could go for a full-blown crypto-secure digest algorithm, but those are slow, so it should instead use a "reasonably practical" implementation. It may not be as well distributed, but that's ok, because the performance degradation observed when collisions do happen are far outweighed by the better performance of the `hashCode` method. – Andreas Dec 17 '15 at 12:20
  • @Andreas to me, it's acceptable treating **distribution** as a fuzzy property, you can say that hashcode is "reasonably practically well distributed". But not **uniqueness**, you cannot say hashcode is "reasonably practically unique", it's unique or not, nothing lays in the middle. That's the difference IMHO – Luigi Cortese Dec 17 '15 at 12:41
0

When you read this carefully, you'll notice that this only means objects should try to avoid collisions ('as much as reasonably practical'), but also that you are not guaranteed to have different hashcodes for unequal objects.

So the promise is not very strong, but it is still very useful. For instance when using the hashcode as a quick indication of equality before doing the full check.

For instance ConcurrentHashMap which will use (a function performed on) the hashcode to assign a location to an object in the map. In practice the hashcode is used to find where an object is roughly located, and the equals is used to find the precisely located.

A hashmap could not use this optimization if objects don't try to spread their hashcodes as much as possible.

Thirler
  • 20,239
  • 14
  • 63
  • 92