Java HashMap Hashing function

Question

I was going through Java's HashMap hash() implementation , its like below

final int hash(Object k) {
            // some checks
            h ^= k.hashCode();
            // This function ensures that hashCodes that differ only by
            // constant multiples at each bit position have a bounded
            // number of collisions (approximately 8 at default load factor).
            h ^= (h >>> 20) ^ (h >>> 12);
            return h ^ (h >>> 7) ^ (h >>> 4);
                   // >>> is Unsigned right shift
    }

I am not sure why the below code is added , and what advantage is gained by same ?

        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);

Or Let me re-frame my question if i remove above code from implementation what is the disadvantage ? I understand some how its avoiding chances of collision but not sure "exactly" how ?

can some one help me understand by giving an example , and explain how will it work with and without the above code ?

how is ">>>" operator used to avoid collision even if key returns same value ? whats the significance of 7 & 4 ? is it better to use 13 and 7 or some other integers instead of 7 & 4 ?? — Lav, Mar 19 '13 at 03:10
@Lav: The constants were determined experimentally; there's no real significance to them. (Indeed, for hashing, it's usually a _good_ idea to use numbers ad-hoc with no real pattern to them). But the issue is not when two keys have _exactly_ the same hash code, but to reduce collisions when two keys have the same hash code mod small powers of 2. — Louis Wasserman, Mar 19 '13 at 03:15
ok , i get better feel of it , may be a concrete example can help better ? — Lav, Mar 19 '13 at 03:19
it ensures that hashCodes that differ only by constant multiples at each bit position have a bounded number of collisions. (approximately 8 at default load factor) — Sam I am says Reinstate Monica, Mar 19 '13 at 18:12

score 5 · Answer 1 · answered Mar 19 '13 at 03:09

5

The Java hash table implementation sizes the table not to a prime size, but to a power of two size. This allows it to use fast bit masking instead of expensive remainder operations, which is generally a good thing, but the drawback is that particularly bad hash functions might have more collisions than usual. The code you cite mixes the bits of the hash in a way that minimizes the extra collisions.

answered Mar 19 '13 at 03:09

Louis Wasserman

191,574
25
345
413

1

yes , i understood "mixes the bits of the hash in a way that minimizes the extra collisions" .. my ques is in what way is the collision minimized exactly ... and why integers like 20 , 12, 7 , 4 are used and not some ... looking for an example of a hashcode implementation that returns same hashcode but the above code >>> takes it different buckets – Lav Mar 19 '13 at 03:18
1

The integers were arbitrarily chosen between the possible numbers between `0` and `31`, which are the only numbers you can shift to, with some experimentation involved. They aren't meaningful in any particular way. – Louis Wasserman Mar 19 '13 at 03:20

Java HashMap Hashing function

1 Answers1

Linked

Related