6

Java doc of hash method states,

Retrieve object hash code and applies a supplemental hash function to the result hash, which defends against poor quality hash functions. This is critical because HashMap uses power-of-two length hash tables, that otherwise encounter collisions for hashCodes that do not differ in lower bits.

What I am not able to understand is,

1) Why HashMap uses power-of-two length hash tables?

It is stated while declaring table too:

/**
 * The table, resized as necessary. Length MUST Always be a power of two.
 */
transient Entry<K,V>[] table;

Why this constraint?

2) What does otherwise encounter collisions for hashCodes that do not differ in lower bits. mean?

codingenious
  • 8,385
  • 12
  • 60
  • 90
  • 1
    Please ask one specific question in one post. – Deepu--Java Apr 08 '14 at 11:30
  • 3
    this question have already been covered in this post : http://stackoverflow.com/questions/15437345/java-a-prime-number-or-a-power-of-two-as-hashmap-size – BaptisteL Apr 08 '14 at 11:36
  • 1
    It uses power of two size because you can easily pick the bucket by `hashCode & MASK` where `MASK = 000....111` where amount of 1s == current power of 2 used for size. – Andrey Apr 08 '14 at 11:44

2 Answers2

3

The purpose of a hashmap is to very quickly narrow down how many objects you need to look at (ideally 0 or 1) when searching for a specific key.

The general method for HashMap.get(key) is as follows:

  1. Call key.hashCode() to get a single integer that represents the object.

  2. Look in a hash "bucket" based on that hashcode, which can contain zero or more entries.

  3. Go through each entry in the bucket and find if any entry's key is .equals(key). If so, return it. If no entry in the bucket has an equal key to the one searched for, return null.

The difference between a good hashmap and a bad hashmap is speed. You have to balance all three of these concerns:

  1. How quickly can you transform the key into a hashcode?

  2. How often do two different keys map to the same hashcode?

  3. How often will you put two keys with different hashcodes into the same "bucket"?

Java's designers have chosen a set of tradeoffs they think balances best. There is no right answer, but you do have to choose a specific approach, and write into the documentation what that approach is.

Java's designers likely have some statistic evidence based on typical data added to hashmaps.

They chose to convert hashcode to bucket by extracting the lowest n bits of the hashcode, because those vary more often than the upper bits. They chose extracting bits over another typical method of converting hashcode to bucket (integer remainder after dividing by a prime number) because it's typically a faster operation on the platforms Java is most commonly deployed to.

What Java's designers may have found is that step 1, the implementation of hashCode(), is written by Java users, and can often be terrible, returning the same hashCode for lots of objects they want to store in the same hashmap. Imagine if the hashCode was this:

public class MyInteger {
    final int i;
    public MyInteger(int i) {
        this.i = i;
    }
    public int hashCode() {
        return i << 24; // will return 0x00000000, 0x01000000, etc.
    }
    public boolean equals(Object o) {
        return (o != null) && (o instanceof MyInteger) && ((MyInteger)o).i == i;
    }
}

This is what they call "poor quality"; the lower bits of the hashcode don't vary very much. In this pathological implementation, the lower 24 bits don't vary at all!

In this case, for hashmaps any smaller than 16,777,216 buckets, every single key that could go in the hashmap will go to bucket 0. The other 16,777,215 buckets will be empty.

Other people's hashcodes may not be as bad as this, but they're bad enough that Java's designers chose to add a second hashcode to help improve the chance that two different keys will go into two different buckets, thus reducing how many objects need to be checked for equality each time a given key is retrieved.

Stuart Caie
  • 2,803
  • 14
  • 15
0

When the HashMap needs to resize it creates a new array of buckets, these buckets are accessed using hashCode() (with minor additional manipulation to map int hashCode to the number of buckets in hashMap).
Power of 2 size of this array allows for some clever mapping of int hashCode to bucket number - basically using just the lower part of hashCode (masking the higher part) to address the buckets.

Germann Arlington
  • 3,315
  • 2
  • 17
  • 19
  • 1
    This doesn't answer anything. It could use other growth function. "Some clever mapping" is not a serious explanation. – Andrey Apr 08 '14 at 11:39
  • @Andrey It leaves some space for additional research by the OP and future readers. I can be more specific in saying that "clever mapping" is basically using just the lower part of hashCode (masking the higher part) to address the buckets. – Germann Arlington Apr 08 '14 at 11:44