5

I was reading about the fact that How exactly the HashMap works in java .I found code in the hash method in the HashMap class the hashcode is one of the operand with Shift right zero fill operator .The other operands are like 12 7 4 20. Later some more processing is done on the result .My question is why only these four number are chossen for calculating the value in hash function which actually used for calculating the position in the bucket

public V put(K key, V value) {
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key.hashCode());
     int i = indexFor(hash, table.length);
     for (Entry<K,V> e = table[i]; e != null; e = e.next) {
         Object k;
         if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
             V oldValue = e.value;
             e.value = value;
             e.recordAccess(this);
             return oldValue;
         }
     }

     modCount++;
     addEntry(hash, key, value, i);
     return null;
}


static int hash(int h) {
     // This function ensures that hashCodes that differ only by
     // constant multiples at each bit position have a bounded
     // number of collisions (approximately 8 at default load factor).
     h ^= (h >>> 20) ^ (h >>> 12);
     return h ^ (h >>> 7) ^ (h >>> 4);
}
Deepak
  • 2,287
  • 1
  • 23
  • 30

1 Answers1

3

It’s not that “only these four number are chosen for calculating the value in hash function”, the hash code returned by the hashCode method of the key object is the (very important) input. This method in the HashMap implementation just tries to improve this, given the knowledge about how the HashMap will use that value afterwards.

Typical implementations will only use the lower bits of a hash code as the internal table has a size which is a power of two. Therefore the improvement shall ensure that the likelihood of having different values in the lower bits is the same even if the original hash codes for different keys differ in upper bits only.

Take for example Integer instances used as keys: their hash code is identical to their value as this will spread the hash codes over the entire 2³² int range. But if you put the values 0xa0000000, 0xb0000000, 0xc0000000, 0xd0000000 into the map, a map using only the lower bits would have poor results. This improvement fixes this.

The numbers chosen for this bit manipulation, and the algorithm in general are a field of continuous investigations. You will see changes between JVM implementations as the development never stops.

Holger
  • 285,553
  • 42
  • 434
  • 765