11

I was looking at the implementation of HashMap in JDK8. In the get methods, I saw the below line which is used to find the Node that matches with the given key.

if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))

Why is there the need to compare the hash value along with key? Why is the line above not written as:

if (((k = e.key) == key) || (key != null && key.equals(k)))

Is there any explanation on why it is done this way? Thank you.

Nikolas Charalambidis
  • 40,893
  • 16
  • 117
  • 183
  • 1
    @Hulk, Yes that is why key equality is also checked. Isn't in this case hash code equality redundant? – Sandeep Misal Jun 05 '18 at 16:58
  • 1
    Perhaps as an optimization to avoid calling `equals()`. – SLaks Jun 05 '18 at 17:01
  • 1
    comparison of two `int` values is much faster than `Object.equals` especially as the object gets more complex – Peter Lawrey Jun 05 '18 at 17:01
  • @SandeepMisal indeed I misunderstood your question. You are aksing why the hashcodes are compared first? This is indeed a performance optimisation – Hulk Jun 05 '18 at 17:03
  • @PeterLawrey, I see your point. But there is space / time trade-off. Hash value is stored for every entry. As number of entries increases more space is required. So Is it that here it is assumed that equals will take more time always. Not sure? – Sandeep Misal Jun 05 '18 at 17:08
  • @SandeepMisal .equals takes more time even for `Integer` as there is an couple of extra de-references. It can take 1000x longer easily for even simple objects. – Peter Lawrey Jun 05 '18 at 17:10
  • @SandeepMisal wrt hash value taking space: note that this is NOT the main usage for hash code; rather hash code is used for finding location within linear hash space for entry. So it is needed regardless of whether it is used as short-circuiting key equality comparison or not. Since it is required anyway, and is being accessed already (for locating entry that may match) meaning it's in access cache too, it is cheap and esp. effective if there are many collisions. – StaxMan Jun 05 '18 at 17:19

2 Answers2

9

What seems to be causing your confusion, are two things:

1. Comparing the hash values is (often very much) faster than comparing keys directly.

2. In a == operator, the second condition won't be checked if the first is false.

So first the hash values are compared, which is fast:

  • When they are not equal, you know the keys aren't equal as well, and you are done.

  • When they are equal, you don't know if the keys are equal as well, so you must compare keys, which is (relatively) slow.

Since most keys are not equal, most of the time you only compare hashes. Only when keys are equal (or when hashes are equal due to hash collisions) do you compare the keys, which is rare, and thus you have a performance benefit.

Max Vollmer
  • 8,412
  • 9
  • 28
  • 43
  • 2
    Note: The Javadoc for Map suggests "Implementations are free to implement optimizations whereby the equals invocation is avoided, for example, by first comparing the hash codes of the two keys. " – Peter Lawrey Jun 05 '18 at 17:07
3

This is an efficient way to check if two values can possibly be equal.

The contract for hashcode mandates:

JavaDocs of Object.hashCode

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

Therefore, if the hashes are distinct, there is not point in performing further checks.

As HashMap requires the hashcodes for the keys anyway for choosing the buckets to put the entries in, the tradeoff is storing an additional int per entry vs. possibly having to compute it repeatedly and having to execute equals for the keys more often. HashMap is more optimized for fast retrieval and insertion, less so for memory efficiency.


Side Note: HashMap relies on keys not being mutated in any way that would change their "identity" in terms of equals and hashcode - while this may seem obvious, it is not explicitly mentioned in the JavaDocs for HashMap and has led to questions in the past: Are mutable hashmap keys a dangerous practice? - this is covered by the more general Map contract:

Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.

Hulk
  • 6,399
  • 1
  • 30
  • 52