18

Possible Duplicate:
Understanding strange Java hash function

static int hash(int h) {   
    // This function ensures that hashCodes that differ only by   
    // constant multiples at each bit position have a bounded   
    // number of collisions (approximately 8 at default load factor).   
    h ^= (h >>> 20) ^ (h >>> 12);   
    return h ^ (h >>> 7) ^ (h >>> 4);   
} 

I don't quite understand the algorithm principle of this implementation. Any explanation or any resouce I could refer to ? Thanks !

UPDATE

Thanks all for the answers and resouces. Actually I understand how hash works, yet but not konw why this code will ensure a bounded number of collisions, as the comment says. Is there any theoretical verification?

Community
  • 1
  • 1
l4rmbr
  • 1,227
  • 7
  • 22
  • 4
    http://stackoverflow.com/questions/9335169/understanding-strange-java-hash-function – xiaofeng.li Nov 19 '12 at 10:11
  • The principle is to a) pseudo-randomly rearrange the bits of a number b) without using much CPU c) while giving a uniform distribution. This shifts the number by a selection of bit ranges in combination of 4,7,12,16,19,20,24 and 27 in a compact manner. – Peter Lawrey Nov 19 '12 at 10:12
  • 1
    This post: - [what-hashing-function-does-java-use-to-implement-hashtable-class](http://stackoverflow.com/questions/9364134/what-hashing-function-does-java-use-to-implement-hashtable-class) might help you. – Rohit Jain Nov 19 '12 at 10:13

3 Answers3

5

The main goal is to generate very different values for similar input parameters and minimize number of collisions. http://en.wikipedia.org/wiki/Hash_function

This implementation is just one satisfactory option of many possible functions.

Nikolay Kuznetsov
  • 9,467
  • 12
  • 55
  • 101
5

This function just helps better avoid collisions in HashMap.

If you have good hashCode implementation, you still can get collision just because you take hashCode % size to detect bucket for object.

Example:

Assume, size of your HashMap is 20.

  1. You calculated the hashCode() for object1 and get 401. Bucket is 401 % 20 = 1.
  2. You calculated the hashCode() for object2 and get 3601. Bucket is 3601 % 20 = 1
  3. You calculated the hashCode() for object3 and get 1601. Bucket is 1601 % 20 = 1.

So, even you have three different hashCodes you get one bucket for all three objects, that means complexity of your HashMap O(n) instead of O(1).

If we apply function hash to all obtained hashcodes we get

  1. hash = 395, bucket = 395 % 20 = 15
  2. hash = 3820, bucket = 3820 % 20 = 0
  3. hash = 1577, bucket = 1577 % 20 = 17

Clear, that applying hash as additional step we get three different buckets, that preserve constant time access to your object.

mishadoff
  • 10,719
  • 2
  • 33
  • 55
3

the >>> operator is the bitshift but treated as unsigned.

^ is XOR (eXclusive or)

So the line

h ^= (h >>> 20) ^ (h >>> 12);   

means xor the original with h bitshifted 20 bits, XOR with h shifted 12 bits

then

h ^ (h >>> 7) ^ (h >>> 4); 

the h from above, xor h shifted 7 bits, xor h shifted 4 bits. Im not 100% sure why its set up like that though

exussum
  • 18,275
  • 8
  • 32
  • 65