18

I understand there is a hashing technique is applied to a key to store its value in the memory address.

But I don't understand how the collision is happening here? Which hashing algorithm does Java use to create a memory space? Is it MD5?

Andy
  • 17,423
  • 9
  • 52
  • 69
Sahal
  • 4,046
  • 15
  • 42
  • 68

4 Answers4

54

The basic idea of HashMap is this:

  1. A HashMap is really an array of special objects that hold both Key and Value.
  2. The array has some amount of buckets (slots), say 16.
  3. The hashing algorithm is provided by the hashCode() method that every object has. Therefore, when you are writing a new Class, you should take care of proper hashCode() and equals() methods implementation. The default one (of the Object class) takes the memory pointer as a number. But that's not good for most of the classes we would like to use. For example, the String class uses an algorithm that makes the hash from all the characters in the string - think of it like this: hashCode = 1.char + 2.char + 3.char... (simplified). Therefore, two equal Strings, even though they are on a different location in the memory, have the same hashCode().
  4. The result of hashCode(), say "132", is then the number of bucket where the object should be stored if we had an array that big. But we don't, ours is only 16 buckets long. So we use the obvious calculation 'hashcode % array.length = bucket' or '132 mod 16 = 4' and store the Key-Value pair in a bucket number 4.
    • If there's no other pair yet, it's ok.
    • If there's one with Key equal to the Key we have, we remove the old one.
    • If there's another Key-Value pair (collision), we chain the new one after the old one into a linked list.
  5. If the backing array becomes too full, so we would have to make too many linked lists, we make a new array with doubled length, rehash all the elements and add them into the new array, then we dispose the old one. This is most likely the most expensive operation on HashMap, so you want to tell your Maps how many buckets you'll use if you know it in before.
  6. If somebody tries to get a value, he provides a key, we hash it, mod it, and then go through the potential linked list for the exact match.

An image, courtesy of Wikipedia: The graphics

In this case,

  • there is an array with 256 buckets (numbered, of course, 0-255)
  • there are five people. Their hashcodes, after being put through mod 256, point to four different slots in the array.
  • you can see Sandra Dee didn't have a free slot, so she has been chained after John Smith.

Now, if you tried to look up a telephone number of Sandra Dee, you would hash her name, mod it by 256, and look into the bucket 152. There you'd find John Smith. That's no Sandra, look further ... aha, there's Sandra chained after John.

Petr Janeček
  • 37,768
  • 12
  • 121
  • 145
  • If you find any link that explain with images please put it here.. That will be very helpful .. – Sahal Jun 05 '12 at 09:55
  • 1
    Done. It's from [this article in wikipedia](http://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_list_heads). – Petr Janeček Jun 05 '12 at 10:17
  • 1
    And now, when you know this, the [this is the `String`'s real `hashCode()` implementation](http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#hashCode%28%29). [And](http://stackoverflow.com/questions/299304/why-does-javas-hashcode-in-string-use-31-as-a-multiplier) [some](http://stackoverflow.com/questions/1835976/what-is-a-sensible-prime-for-hashcode-calculation) [random](http://stackoverflow.com/questions/3613102/why-use-a-prime-number-in-hashcode) links on the prime number selection in `hashCode()`. – Petr Janeček Jun 05 '12 at 10:38
  • Unfortunately this isn't what `HashMap` actually does. It XORs the high and low 16 bits of the `hashCode()` together, for a start. It isn't what `HashTable` does either. – user207421 Mar 28 '18 at 04:19
  • 2
    @EJP Right. And it xors with 2n-1 instead of using a mod, so keys whose hashes differ only in the high bits would always collide without this, eh, technique. And it can treeify the colliding keys in a single bucket when the linked list for overflow entries gets too long. There's probably a dozen additional implementation details that are helping making `HashMap` really fast in practice. But obviously this answer does not contain them not to confuse the OP. He was a novice at the time, and needed to know the algorithmic basics, not the low-level optimizations, I believe. – Petr Janeček Mar 28 '18 at 07:32
  • Is the hash of key or hash of value being used, when choosing the bucket? – Gayan Weerakutti Aug 17 '18 at 13:40
  • @reversiblean Hash of the key. That way you can then call `map.get(key)` and it will give you the previously stored value, quickly. – Petr Janeček Aug 17 '18 at 13:48
  • So the data stored in the slots are actually references pointing to the address, right? – Toffee Conmigo Oct 22 '20 at 17:20
4

Here Hash does not mean for Hashing techniques such as MD5. Its the HashCode of memory location used to store an Object for a particular key.

Readings:

This is somewhat more clearer explanation of How HashMap works?

Community
  • 1
  • 1
Asif
  • 4,980
  • 8
  • 38
  • 53
  • For example if you have 100 records in the HashTable, and you want to find out a key with 'Men In Black', How fast it finds the value of it? I think it will not take more than one iteration to find it Q(1). What I felt is compiler create a hashcode of 'Men In Black' and it store the value in that particular memory location. Later when we try to retrieve this, it uses the hascode to find the value of it.....Am i missing something ?.... If that is not the case how run time of the key search is one ie.Q(1) ? – Sahal Jun 05 '12 at 09:25
  • 2
    The Key in Map is stored under given position of array (memory). The position is set by RunTime (not compiler), using algorithm that use transformed hashCode of object and array length. Time needed to retrieve element is O(1), that do not require any iteration. – Damian Leszczyński - Vash Jun 05 '12 at 09:32
  • It is not the '`hashCode` of a memory location'. The `hashCode` *yields* the memory location, by a process involving further computation. Remainder of the answer consists merely of links. – user207421 Mar 28 '18 at 04:21
1

As default implementation hashCode() function in Object class return the memory address as the hash which is used as key in HashTable & HashMap.

roottraveller
  • 7,942
  • 7
  • 60
  • 65
ejb_guy
  • 1,125
  • 6
  • 6
  • For example if you have 100 records in the HashTable, and you want to find out a key with 'Men In Black', How fast it finds the value of it? I think it will not take more than one iteration to find it Q(1). What I felt is compiler create a hashcode of 'Men In Black' and it store the value in that particular memory location. Later when we try to retrieve this, it uses the hascode to find the value of it.....Am i missing something ?.... If that is not the case how run time of the key search is one ie.Q(1) ? – Sahal Jun 05 '12 at 09:24
  • 1
    It will look at the memory location where String object with value 'Men In Black' is stored. This memory location becomes the hashcode which is used to retrieve the value. Ideally it should be Q(1) , generally due to collision and for optimizing the memory used, there will be slight variation and for all practical purposes it can be taken as Q(1). The variation will depend on the number of items in bucket at index calculated from hashcode. Its not necessary that all 100 items in hashmap will have unique index. – ejb_guy Jun 05 '12 at 09:42
  • Also, the `String` class has it's own `hashCode()` implementation, it doesn't take the memory location as it, but uses an algorithm to make a hash from the actual characters in the String... – Petr Janeček Jun 05 '12 at 09:47
  • Yaah, here I confused was, For strings it create hash from its own key... But for integer keys ? – Sahal Jun 05 '12 at 10:25
  • @Sahal I think you know this by today, but every object has its own hashCode method, even Integer (where the hashCode == value). – Petr Janeček Mar 28 '18 at 07:20
0

After going through @Slanec's answer, do see the javadoc from Java-8, as there are significant changes. For ex: 'TREEIFY' where the LinkedList is converted to a TreeMap in case the threshold number of entries per bucket (currently 8) is reached.

Biman Tripathy
  • 2,766
  • 3
  • 23
  • 27
  • The question is about `HashMap` and `HashTable`, not about `TreeMap`, which doesn't use hashing at all. – user207421 Mar 28 '18 at 04:23
  • @EJP What the author here meant was that if a large number of collisions happen to a single bin, at one point instead of a singly linked list the colliding keys get restructured into a tree, not unsimilar to a `TreeMap` (with both obvious and unobvious differences, e.g. noncomparable keys are sorted on their `hashCode()`). This helps to protect against catastrophic cases of many collisions where a pre-Java-8 `HashMap` slowly degrades to a linked list. – Petr Janeček Mar 28 '18 at 07:19