What hash function is better?

Question

I write my implementation of HashMap in Java. I use open addressing for collision resolution. For better key distribution I want use a nice hash function for int hashcode of key. I dont know what hash function is better for it?

public int getIndex(K key) { return hash(key.hashCode()) % capacity; }

I need a hash function for hashcode of key.

Your question is not at all clear. Are you reimplementing HashMap (why?) or writing a hashCode() method for a class you wish to use as a HashMap key? In your sample why are you re-hashing the hashCode provided by the key? — Jim Garrison, Feb 04 '12 at 07:04

score 3 · Answer 1 · answered Feb 04 '12 at 07:07

Any hash that distributes the values you're expecting to receive evenly is a good hash function.

Your goal is to maximize performance (well, maximize performance while maintaining correctness). The primary concern there is to minimize bucket collisions. This means that the ideal hash is tailored to your input data - if you know what you'll receive, you can choose the hash the produces a minimal number of collisions and maybe even a cache-optimal access pattern.

However, that's not usually a realistic option, so you just choose a hash whose output is unbiased and unpredictable (one that behaves like a pseudorandom number generator, but deterministic). Some such functions are the "murmur" hash family.

score 1 · Accepted Answer · edited May 23 '17 at 11:48

The main problem with using % capacity is that it can return negative and positive values.

HashMap avoids this issue by using a power of 2 and uses the following approach

 public int getIndex(K key) { return hash(key.hashCode()) & (capacity-1); }

If the capacity is not a power of 2, you can ignore the high bit (which is often no so random)

 public int getIndex(K key) { return (hash(key.hashCode()) & 0x7FFFFFFF) % capacity; }

The hash function actually used can matter. HashMap uses the following

/**
 * Applies a supplemental hash function to a given hashCode, which
 * defends against poor quality hash functions.  This is critical
 * because HashMap uses power-of-two length hash tables, that
 * otherwise encounter collisions for hashCodes that do not differ
 * in lower bits. Note: Null keys always map to hash 0, thus index 0.
 */
static int hash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

I would use this, unless you have a good reason not to. E.g. for security reasons, if you have a service which could the subject of a denial of service attack, you will want to use a different hash to avoid a malicious user turning your HashMap into a LinkedList. Unfortunately you still have to use a different hashCode() as well as you can create a long list of Strings with the underlying hash code so mutating it later is too later.

Here is a list of strings with all have a hashCode() of 0, there is nothing a hash() function can do about that.

Why doesn't String's hashCode() cache 0?

What hash function is better?

2 Answers2