How HashMap really works?

Question

According to this question

how-does-a-hashmap-work-in-java and this

Many Key-Values pairs could be stocked in the same bucket (after calculating the index of the bucket using the hash), and when we call get(key) it looks over the linked list and tests using equals method.

It doesn't sound really optimized to me, doesn't it compare hashCodes of the linked List before the use of equals?

If the answer is NO:

it means most of the time the bucket contains only 1 node, could you explain why ? because according to this logical explanation many differents keys could have the same bucket index.

how the implementation ensure the good distribution of keys ? this probably mean that the bucket table size is relative to the number of keys

And even if the Table Bucket size is equals to the number of keys, how the HashMap hashCode function ensure the good distribution of keys ? isn't a random distribution ?,

could we have more details?

This is answered by the he first answer in the linked question. — Oliver Charlesworth, Aug 22 '15 at 17:54
No it says : The hashmap will then look into the corresponding bucket, and then it will compare the key that you gave with the keys of all pairs in the bucket, by comparing them with equals(). — Nassim MOUALEK, Aug 22 '15 at 17:55
Which answers your question "does it compare hashcodes of the linked list instead of equals?". (i.e. "no") — Oliver Charlesworth, Aug 22 '15 at 17:58
but its not optimized, many elements could share the same bucket, i update my question — Nassim MOUALEK, Aug 22 '15 at 18:00
Usually this won't happen unless the `hashCode` of your object is poorly implemented, so in the end a simple `put` operation will take O(N), but that's not an issue in `HashMap` implementation, it's in the `hashCode` of the key. — Luiggi Mendoza, Aug 22 '15 at 18:02
so most of the time the bucket contains only one Element, could you explain why ? how the implementation do to distribute keys correctly ? — Nassim MOUALEK, Aug 22 '15 at 18:05

yshavit · Accepted Answer · 2015-08-22T19:04:21.430

7

The implementation is open source, so I would encourage you to just read the code for any specific questions. But here's the general idea:

The primary responsibility for good hashCode distribution lies with the keys' class, not with the HashMap. If the key have a hashCode() method with bad distribution (for instance, return 0;) then the HashMap will perform badly.
HashMap does do a bit of "re-hashing" to ensure slightly better distribution, but not much (see HashMap::hash)
On the get side of things, a couple checks are made on each element in the bucket (which, yes, is implemented as a linked list)
- First, the HashMap checks the element's hashCode with the incoming key's hashCode. This is because this operation is quick, and the element's hashCode was cached at put time. This guards against elements that have different hashCodes (and are thus unequal, by the contract of hashCode and equals established by Object) but happen to fall into the same bucket (remember, bucket indexes are basically hashCode % buckets.length)
- If that succeeds, then, the HashMap checks equals explicitly to ensure they're really equal. Remember that equality implies same hashCode, but same hashCode does not require equality (and can't, since some classes have potentially infinite number of different values -- like String -- but there are only a finite number of possible hashCode values)

The reason for the double-checking of both hashCode and equals is to be both fast and correct. Consider two keys that have a different hashCode, but end up in the same HashMap bucket. For instance, if key A has hashCode=7 and B has hashCode=14, and there are 7 buckets, then they'll both end up in bucket 0 (7 % 7 == 0, and 14 % 7 == 0). Checking the hashCodes there is a quick way of seeing that A and B are unequal. If you find that the hashCodes are equal, then you make sure it's not just a hashCode collision by calling equals. This is just an optimization, really; it's not required by the general hash map algorithm.

edited Aug 22 '15 at 19:04

answered Aug 22 '15 at 18:52

yshavit

42,327
7
87
124

So my intuition was good? it compare hashCode instead of using Equals during the Linked List look over? – Nassim MOUALEK Aug 22 '15 at 18:57
No, it compares _both_. First it compares hashCode as a quick check to rule out most false positives, but then it uses equals to get the true answer. – yshavit Aug 22 '15 at 18:57
yes i know that it would use the equal after, but this mean that my intuition is good and the first post is wrong? – Nassim MOUALEK Aug 22 '15 at 19:01
I'm not sure which post you mean by the first post. – yshavit Aug 22 '15 at 19:01
most answer says that it use Equal only during the Linked List look over – Nassim MOUALEK Aug 22 '15 at 19:02
1

Then yes, they're incorrect if they're talking specifically about the JDK's implementation of HashMap (in any of the JDKs I've seen -- Oracle's for Java 6, 7 and 8). If they're talking about the idea of "hash maps" in general, as a data structure, then those maps only really _need_ to check equality -- the hashCode check is a detail/optimization. – yshavit Aug 22 '15 at 19:06
so could we consider that its a random distribution related to the hashCode? – Nassim MOUALEK Aug 22 '15 at 19:35
Absolutely. If the hash code is bad, so will be the bucketing. In the extreme case, if the hash code is always the same value, the hash map performs like a linked list. In fact, this can be used to make a DoS attack: http://stackoverflow.com/questions/8669946/application-vulnerability-due-to-non-random-hash-functions – yshavit Aug 22 '15 at 20:32

Warren Dew · Answer 2 · 2015-08-22T18:56:22.433

To avoid having to make multiple comparisons in linked lists, the number of buckets in a HashMap is generally kept large enough that most buckets contain only one item. By default the java.util.HashMap tries to maintain enough buckets that the number of items is only 75% of the number of buckets.

Some of the buckets may still contain more than one item - what's called a "hash collision" - and other buckets will be empty. But on average, most buckets with items in them will contain only one item.

The equals() method will always be used at least once to determine if the key is an exact match. Note that the equals() method is usually at least as fast as the hashCode() method.

A good distribution of keys is maintained by a good hashCode() implementation; the HashMap can do little to affect this. A good hashCode() method is one where the returned hash has as random a relationship to the value of the object as possible.

For an example of a bad hashing function, once upon a time, the String.hashCode() method only depended on the start of the string. The problem was that sometimes you want to store a bunch of strings in a HashMap that all start the same - for example, the URLs to all the pages on a single web site - resulting in an inordinately high proportion of hash collisions. I believe String.hashCode() was later modified to fix this.

Ok, but still to me as a random distribution – Nassim MOUALEK Aug 22 '15 at 18:52 — Nassim MOUALEK, Aug 22 '15 at 18:52

M Sach · Answer 3 · 2015-08-22T18:15:54.537

0

dosn't it compares hachCodes of the linked List instead of use the equals

Its not required. hashcode is used to determine the bucket number be it put or get operation. Once you know the bucket number with hashcode and find its a linked list there, then you know that you need to iterate over it and need to check for equality to find exact key . so there is no need of hashcode comparison here

Thats why hashcode should be as unique as as it can be so that its best for lookup.

it means most of the time the bucket contains only 1 node

No . It depend on the uniqueness of hascode. If two key objects have same hashcode but are not equal, then bucket with contain two nodes

edited Aug 22 '15 at 18:15

answered Aug 22 '15 at 18:04

M Sach

33,416
76
221
314

see my extended answer – M Sach Aug 22 '15 at 18:16
according to http://www.javamadesoeasy.com/2015/02/hashmap-custom-implementation.html many different keys could have the same bucket index, it dosn't answer the question – Nassim MOUALEK Aug 22 '15 at 18:19
@NassimMOUALEK: That may be because it's unclear what the question actually **is**... Please read https://en.wikipedia.org/wiki/Hash_table carefully, and then come back with specific/directed questions if certain things still don't make sense. – Oliver Charlesworth Aug 22 '15 at 18:27
i just try to understand exactly how the hashMap is implemented, its just not logical to me that hashmap look over the bucket with Equal, only if it ensure that there is a minimal number of element in it, and the question is how it works to know the size of the Bucket Table to offer a good distribtuion – Nassim MOUALEK Aug 22 '15 at 18:31

score 0 · Answer 4 · answered Aug 22 '15 at 18:56

When we pass Key and Value object to put() method on Java HashMap, HashMap implementation calls hashCode method on Key object and applies returned hashcode into its own hashing function to find a bucket location for storing Entry object, important point to mention is that HashMap in Java stores both key and value object as Map.Entry in bucket which is essential to understand the retrieving logic.

While retrieving the Values for a Key, if hashcode is same to some other keys, bucket location would be same and collision will occur in HashMap, Since HashMap use LinkedList to store object, this entry (object of Map.Entry comprise key and value ) will be stored in LinkedList.

The good distribution of the Keys will depend on the implementation of hashcode method. This implementation must obey the general contract for hashcode:

If two objects are equal by equals() method then there hashcode returned by hashCode() method must be same.
Whenever hashCode() mehtod is invoked on the same object more than once within single execution of application, hashCode() must return same integer provided no information or fields used in equals and hashcode is modified. This integer is not required to be same during multiple execution of application though.
If two objects are not equals by equals() method it is not require that there hashcode must be different. Though it’s always good practice to return different hashCode for unequal object. Different hashCode for distinct object can improve performance of hashmap or hashtable by reducing collision.

Devashish Priyadarshi · Answer 5 · 2018-05-25T20:56:42.597

You can visit this git-hub repository "https://github.com/devashish234073/alternate-hash-map-implementation-Java/blob/master/README.md".

You can understand the working the working of HashMap with a basic implementation and examples. The ReadMe.md explains all.

Including some portion of the example here:

Suppose I have to store the following key-value pairs. (key1,val1) (key2,val2) (key3,val3) (....,....) (key99999,val99999)

Let our hash algo produces values only in between 0 and 5.

So first we create a rack with 6 buckets numbered 0 to 5.

Storing:

To store (keyN,valN):
1.get the hash of 'keyN'
2.suppose we got 2
3.store the (keyN,valN) in rack 2

Searching:

For searching keyN:
1.get hash of keyN
2.lets say we get 2
3.we traverse rack 2 and get the key and return the value

Thus for N keys , if we were to store them linearly it will take N comparison to search the last element , but with hashmap whose hash algo generates 25 values , we have to do only N/25 comparison. [with hash values equally dispersed]

You must put a piece of code and explain from that. This doesn't answer the question, you should put that as a comment. But, the question should be off-topic. — Ratata Tata, May 20 '18 at 02:50

How HashMap really works?

5 Answers5

Linked

Related