1

Possible Duplicate:
Explanation of HashMap#hash(int) method

   /**
    * Applies a supplemental hash function to a given hashCode, which
    * defends against poor quality hash functions.  This is critical
    * because HashMap uses power-of-two length hash tables, that
    * otherwise encounter collisions for hashCodes that do not differ
    * in lower bits. Note: Null keys always map to hash 0, thus index 0.
    */
   static int hash(int h) {
       // This function ensures that hashCodes that differ only by
       // constant multiples at each bit position have a bounded
       // number of collisions (approximately 8 at default load factor).
       h ^= (h >>> 20) ^ (h >>> 12);
       return h ^ (h >>> 7) ^ (h >>> 4);
   }

Can anyone explain this method in details, thanks.

Community
  • 1
  • 1
Foredoomed
  • 2,219
  • 2
  • 21
  • 39

2 Answers2

1

One of the problems with designing a general-purpose hash-code, is that you put all of this effort into ensuring a nice spread of bits, and then someone comes along and uses it in such a way as to completely undo that.

Let's take a classic example of a co-ordinate class with an X and a Y, both integers.

It's a classic example, because people will use it to demonstrate that X ^ Y is not a good hashcode, because it's common for there to be several objects where X == Y (all hash to 0) or one whose X and Y are the Y and X of the other (will hash the same) and other cases where we end up with the same hash code.

It comes down to the fact that while integers have a possible range covering 4billion values, in 99% of use they tend to be less than a few hundred or a few tens of thousands at most. We can never get away from trying to spread 18quadrillion possible values among 4billion possible results, but we can work to make those we're likely to actually see, less likely to clash.

Now, (X << 16 | X >> 16) ^ Y does a better job in this case, spreading the bits from X around more.

Unfortunately, if the use of this hash is to do % someBinaryRoundNumer to index into a table (or even % someOtherNumber, to a slightly lesser extent), then for values of X below someBinaryRoundNumber - which we can expect to be most common - this hash becomes effectively return Y.

All our hard work wasted!

The rehash used is to make a hash with such logic, slightly better.

Its worth noting that it wouldn't be fair to be too critical of the (X << 16 | X >> 16) ^ Y approach as another use of the hash could have that form superior to a given alternative.

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251
0

Well not to enter into to fine details but:

  • due to the hascode() and equals() contracts, a poor implementation of the hashcode function could lead to different instances having the same hashcode. This means that you may have a class wit a crappy hashcode() method implementation, such that the equals() method of the class will return false for the A and B instances (meaning that they are different from the business logic point of view) but the hashcode() method returns the same value for instances A and B. Again, this is technically valid according to the hashcode() and equals() contract, but not very proper

  • in a Hashtable-like structure (like HashMap) "buckets" are used to place instances inside the map according to their hashcode. If two instances have the same hashcode() (but are different according to the equas() method) they will both be placed in the same bucket. This is bad from a performance point of view, because you loose some of the retrieving speed inherent to a Hashtable-like structure when you have a lot of such situations. The are called collisions. What happends is that if later on someone uses a "search" instance to retrieve something from the hashmap, and the corresponding hash bucket has more than one occupant, each element in that bucket must be checked with the equals() method to find out which is the one that needs to be retrieved. In an extreme situation a HashMap can be transformed into a linked list like this.

  • That hash(int n) method adds some extra stuff to the existing hashcode() value in order to defend agains such situations.

Shivan Dragon
  • 15,004
  • 9
  • 62
  • 103