Why can index of backing table in java.util.HashMap be the same for two different provided keys?

Question

Internally HashMap has method hash() which defends against poorly written hash-code by applying special function. Next step is that returned value by hash() method is used to calculate index at which new Entry is stored inside backing array called table. It can happen that index is the same for two different keys. For that linked list is used but it is clear to me.

Why can index of backing table be the same for two different keys?

I know that hash-code can be poorly overriden but method hash() states that it protects from hash-code collisions. So why can index of backing table be the same?

EDIT Thanks to all for replies. @Dunkan Jones resize is done automatically when amount of elements you put into HashMap (size) is more or equal to threshhold(calculated according to initialCapacity and loadFactor provided in constructor). Look into method createEntry - size is incremented whenever new Entry is created. My question is why does hash() method + indexFor() method return together the same index for different objects. Due to this same index two Entries are put at the same bucket by means of linked list.

What causes hash() + indexFor() methods return the same index?

I think and can't realise what hash() and indexFor() do by those tricky >>> and & operators?

What hashing in HashMap means ?

Thanks again!

http://stackoverflow.com/questions/730620/how-does-a-hash-table-work — talkol, Jul 23 '13 at 07:22
Check this (http://javarevisited.blogspot.in/2011/02/how-hashmap-works-in-java.html) — Binary01, Jul 23 '13 at 07:32

Duncan Jones · Answer 1 · 2013-07-23T09:58:41.537

1

You are correct, the internal hash() method is used to improve the quality of the results of hashCode(). The internal Javadocs describe why:

Retrieve object hash code and applies a supplemental hash function to the result hash, which defends against poor quality hash functions. This is critical because HashMap uses power-of-two length hash tables, that otherwise encounter collisions for hashCodes that do not differ in lower bits. Note: Null keys always map to hash 0, thus index 0.

However, your underlying question appears to be: Why does the hash map allow multiple values to sit in the same "bucket" rather than just extending the size of the map?

The answer will be performance. It is expensive to recompute all the hashes in the map during a resize operation. Up to a certain point, it will be cheaper to stuff multiple values into the same bucket.

edited Jul 23 '13 at 09:58

answered Jul 23 '13 at 07:43

Duncan Jones

67,400
29
193
254

Thank you for reply! Not, hash() method is used to defend against poorly written hash-code because it takes hash-code and applies special function! Next, value returned by hash() is used to determine index of backing table by method indexFor(). indexFor() uses bitwise & operator applying length of backing table thus returns index within table bounds. Question is why can that index returned by indexFor() be the same. – Volodymyr Levytskyi Jul 23 '13 at 08:06
Thanks,but why indexFor() can generate the same indexes for different results of hash(). But maybe hash() generates the same results from different hash-codes. I know that HashMap grows automatically when there is no enough buckets so there is always enough space and somehow HashMap puts two Entries at the same bucket – Volodymyr Levytskyi Jul 23 '13 at 09:10
@VolodymyrLevytskyi I've edited my answer to hopefully address your actual question. – Duncan Jones Jul 23 '13 at 09:58

Alberto · Accepted Answer · 2013-07-23T12:14:25.950

If I remember correctly, every object that can be a key for a hash map should override hashCode() method, so the general contract (from Javadoc) is

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

In other words:

o1.equals(o2) then o1.hashCode() == o2.hashCode()

Hash maps internally use another hash() function that create another hash code from hashCode() values. This function at some point use modulus (for space efficiency), so different hashCode() values can have equals hash() value (keys are mapped with hash() value).

This is not a problem, because if two keys in the map have equals hash() value, will be compared with equals() method when searched, to ensure they have the same key, and no two objects which, by coincidence, have the same hash code.

Some resource:

Edit after the OP edit

I think indexFor calculates the modulus. The function is

static int  indexFor(int h, int length) {
   return h & (length-1);
}

We know (from theory) that a % b == a & (b - 1) iff b is 2ⁿ. The length field (our "b") is multiple of 2ⁿ:

Applies a supplemental hash function to a given hashCode, which defends against poor quality hash functions. This is critical because HashMap uses power-of-two length hash tables, that otherwise encounter collisions for hashCodes that do not differ in lower bits. Note: Null keys always map to hash 0, thus index 0.

So different hash value can have same modulus, hence different object can have same index.

Why can index of backing table in java.util.HashMap be the same for two different provided keys?

2 Answers2

Edit after the OP edit