1

I was going through Java's HashMap hash() implementation , its like below

final int hash(Object k) {
            // some checks
            h ^= k.hashCode();
            // This function ensures that hashCodes that differ only by
            // constant multiples at each bit position have a bounded
            // number of collisions (approximately 8 at default load factor).
            h ^= (h >>> 20) ^ (h >>> 12);
            return h ^ (h >>> 7) ^ (h >>> 4);
                   // >>> is Unsigned right shift
    }

I am not sure why the below code is added , and what advantage is gained by same ?

        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);

Or Let me re-frame my question if i remove above code from implementation what is the disadvantage ? I understand some how its avoiding chances of collision but not sure "exactly" how ?

can some one help me understand by giving an example , and explain how will it work with and without the above code ?

Lav
  • 1,283
  • 5
  • 21
  • 48
  • 1
    What part of the comment don't you understand? – user207421 Mar 19 '13 at 03:08
  • 1
    how is ">>>" operator used to avoid collision even if key returns same value ? whats the significance of 7 & 4 ? is it better to use 13 and 7 or some other integers instead of 7 & 4 ?? – Lav Mar 19 '13 at 03:10
  • 1
    @Lav: The constants were determined experimentally; there's no real significance to them. (Indeed, for hashing, it's usually a _good_ idea to use numbers ad-hoc with no real pattern to them). But the issue is not when two keys have _exactly_ the same hash code, but to reduce collisions when two keys have the same hash code mod small powers of 2. – Louis Wasserman Mar 19 '13 at 03:15
  • ok , i get better feel of it , may be a concrete example can help better ? – Lav Mar 19 '13 at 03:19
  • it ensures that hashCodes that differ only by constant multiples at each bit position have a bounded number of collisions. (approximately 8 at default load factor) – Sam I am says Reinstate Monica Mar 19 '13 at 18:12

1 Answers1

5

The Java hash table implementation sizes the table not to a prime size, but to a power of two size. This allows it to use fast bit masking instead of expensive remainder operations, which is generally a good thing, but the drawback is that particularly bad hash functions might have more collisions than usual. The code you cite mixes the bits of the hash in a way that minimizes the extra collisions.

Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413
  • 1
    yes , i understood "mixes the bits of the hash in a way that minimizes the extra collisions" .. my ques is in what way is the collision minimized exactly ... and why integers like 20 , 12, 7 , 4 are used and not some ... looking for an example of a hashcode implementation that returns same hashcode but the above code >>> takes it different buckets – Lav Mar 19 '13 at 03:18
  • 1
    The integers were arbitrarily chosen between the possible numbers between `0` and `31`, which are the only numbers you can shift to, with some experimentation involved. They aren't meaningful in any particular way. – Louis Wasserman Mar 19 '13 at 03:20