0

I have been studying the internals of the Hashmap implementation.

For adding or getting value from map based on the key, it will calculate hashcode and then it finds bucket location (or table location/index, correct me if I am wrong).
But it is calculating the hash-code twice.

In the below code snippet, key.hashcode() is native method in object class and then hash method is implemented in the same class.
It is given in the comments of the hash method that why it is being calculated twice, which i could not understand.

Can any one please explain it briefly with a scenario?

int hash = hash(key.hashCode());

/ * Applies a supplemental hash function to a given hashCode, which
* defends against poor quality hash functions.  This is critical
* because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits. Note: Null keys always map to hash 0, thus index 0.           
*/
static int hash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

Thanks.

almightyGOSU
  • 3,731
  • 6
  • 31
  • 41
GrandPa
  • 484
  • 9
  • 28

1 Answers1

5

http://tekmarathon.com/2012/12/04/hashmap-internal-implementation-analysis-in-java/

It means that if in case the algorithm we wrote for hashcode generation does not distribute/mix lower bits evenly, it will lead to more collisions. For example, we have hashcode logic of “empId*deptId” and if deptId is even, it would always generate even hashcodes because any number multiplied by EVEN is always EVEN. And if we directly depend on these hashcodes to compute the index and store our objects into hashmap then 1. odd places in the hashmap are always empty 2. because of #1, it would leave us to use only even places and hence double the number of collisions

It defends against poorly written hash functions. Additionally, similarly valued objects can cause collisions even if they are not necessarily the same. Collisions are not good, they increase the time to find the value associated with the key, because each hash points to a linked list of values that must be iterated upon a retrieval to match the correct key. Even with a good hash function, you would still need to "mix the lower bits" to ensure an even power of two distribution.

See also:

Disclaimer: I've worked heavily with HashMaps for over a year which is where all the research comes from

Community
  • 1
  • 1
  • Are table and bucket are same?,I have seen this terminology many times so just need confirmation. – GrandPa Jun 07 '15 at 07:31
  • It is very clear explanation,So we override hascode method and implementation is poor and allows more collisions,then it defends against it.If we dont override it, the actual hashcode(Integer equivlanet of memory address of the object) is not a poor implmetation(I think),So in that case extra overhead right? – GrandPa Jun 07 '15 at 07:36
  • 1) Depends on the context. Table usually means hashtable or a collection based on hashes of a key/value pair. Buckets are each index of the hashtable (HashMap), which are named so because they can associate many key value pair that have hash collisions. 2) The default Java implementation of hashCode, is of course, imperfect. The second hash ensures that the bits of the hashCode() are sufficiently mixed to fit on the power of two length of the table. It also ensures even distribution of overridden hashCode() that aren't very well implemented. –  Jun 07 '15 at 07:55
  • Actually when i checked in java docs,the bucket has variable name is declared as table. transient Entry[] table;table=new Entry[default capacity]; And Entry object is defined as --> Entry(int h, K k, V v, Entry n) { value = v; next = n; key = k; hash = h; }.The variabe name table confused me little bit as we call it bucket in general.Please correct me if i am wrong – GrandPa Jun 10 '15 at 05:31
  • Each of the entries are termed buckets because they hold the duplicate hash keys. The table is a collection of buckets. –  Jun 10 '15 at 05:43