12
 /**
     * Computes key.hashCode() and spreads (XORs) higher bits of hash
     * to lower.  Because the table uses power-of-two masking, sets of
     * hashes that vary only in bits above the current mask will
     * always collide. (Among known examples are sets of Float keys
     * holding consecutive whole numbers in small tables.)  So we
     * apply a transform that spreads the impact of higher bits
     * downward. There is a tradeoff between speed, utility, and
     * quality of bit-spreading. Because many common sets of hashes
     * are already reasonably distributed (so don't benefit from
     * spreading), and because we use trees to handle large sets of
     * collisions in bins, we just XOR some shifted bits in the
     * cheapest possible way to reduce systematic lossage, as well as
     * to incorporate impact of the highest bits that would otherwise
     * never be used in index calculations because of table bounds.
     */

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

below is the earlier version of JDK 1.6

/**
     * Applies a supplemental hash function to a given hashCode, which
     * defends against poor quality hash functions.  This is critical
     * because HashMap uses power-of-two length hash tables, that
     * otherwise encounter collisions for hashCodes that do not differ
     * in lower bits. Note: Null keys always map to hash 0, thus index 0.
     */
    static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

can someone explain what are benefits of this applying this kind of hashing than it was in done in earlier versions of java. How this will impact speed and quality of key distribution and I am referring to the new hash function implemented in jdk 8 and how it was arrived at this to reduce collisions ?

lazydev
  • 402
  • 4
  • 16
  • 1
    Could you include a code snippet of how it was done in earlier versions? Particularly, there may be different implementations in different versions. Which one exactly do you mean? – tobias_k Apr 11 '16 at 16:23
  • http://stackoverflow.com/questions/30225054/why-is-there-a-transformation-of-hashcode-to-get-the-hash-and-is-it-a-good-idea and http://stackoverflow.com/questions/33177043/why-and-how-does-hashmap-have-its-own-internal-implementation-of-hashcode-call/33177236 – Tom Apr 11 '16 at 16:25
  • @tobias_k edited the question to include previous version of hashing. – lazydev Apr 11 '16 at 17:15

2 Answers2

3

In situations when the hashCode method is fairly badly behaved the performance of HashMap can degrade drastically. For example, say your hashCode method only generated a 16 bit number.

This solves the problem by xoring the hash code with itself shifted right 16. If the number was well-distributed befor this it should still be. If it was bad this should improve it.

OldCurmudgeon
  • 64,482
  • 16
  • 119
  • 213
  • But how did we arrive at shifting 16 bit right and why not 32 makes this efficient. Want to know how did we circle down on this – lazydev Apr 12 '16 at 19:38
  • The size of the hash code is 32 bits, 16 bits is exactly half of the hash code size, shifting right by16 and do XOR is the optimal way that will bring most varieties to hash value when the size of the hashmap is low. It's hard to explain clearly in comment, but you can think about it on how to bring most varieties into hash code in regardless of hashmap sizes (hashmap grows in size of 2^n), it will click that why it is 16 – didxga Jan 03 '21 at 05:40
1

Here is a good explaination about how HashMap works in Java 8. Below is a snippet from the same blog.

To understand this first we need to understand how index is calculated:

Map the hash code to an index in the array. In the simplest way, this could be done by performing a modulo operation on hash code and length of array, such as hash(key) % n. Using modulo ensures that index i is always between 0 and n.

i = hash % n;

For HashMap in Java index i is calculated by the following expression:

i = (n - 1) & hash;

In this expression, variable n refers to the length of the table, and hash refers to the hash of the key.

Since we calculate the modulo using a bit mask ((n - 1) & hash), any bit higher than highest bit of n - 1 will not be used by the modulo. For example, given n = 32 and 4 hash codes to calculate. When doing the modulo directly without hash code transformation, all indexes will be 1. The collision is 100%. This is because mask 31 (n - 1), 0000 0000 0000 0000 0000 0000 0001 1111, makes any bit higher than position 5 un-usable in number h. In order to use these highest bits, HashMap shifts them 16 positions left h >>> 16 and spreads with lowest bits (h ^ (h >>> 16)). As a result, the modulo obtained has less collision.

Sri9911
  • 1,187
  • 16
  • 32