Understanding what makes for a good hash function is tricky, as there are in fact a great many different functions that are used and for slightly different purposes.
Java's hash tables work as follows:
- They ask the key object to produce its hash code. The implementation of the
hashCode()
method is likely to be of distinctly variable quality (in the worst case, returning a constant value!) and will definitely not be adapted to the particular hash table you're working with.
- They then use the above function to mix the bits up a bit, so that information present in the high bits also gets moved down to the low bits. This is important because next …
- They take the mod of the hash code (w.r.t. the number of hash table array entries) to get the index into the array of hash table chains. There's a distinct possibility that the hash table array will have size equivalent to a power of 2, so the mixing down of the bits in step 2 is important to ensure that they don't just get thrown away.
- They then traverse the chain until they get to the entry with an equal key (according to the
equals()
method).
To complete the picture, the number of entries in the hash table array is non-constant; if the chains get too long the array gets replaced with a new larger array and everything gets rehashed. That's relatively fast and has good performance implications for normal use patterns (e.g., lots of put()
s followed by lots of get()
s).
The actual constants used are fairly arbitrary (and are probably chosen by experiment with some simple corpus including things like large numbers of Integer
and String
values) but their purpose is not: getting the information in the whole value spread to most of the low bits in the value ensures that such information as is present in the output of the hashCode()
is used as well as possible.
(You wouldn't do this with perfect hashing or cryptographic hashing; despite the similar names, they have very different implementation strategies. The former requires knowledge of the key space so that collisions are avoided/reduced, and the latter needs information to be moved about in all directions, not just to the low bits.)