Why does the get method of HashMap have a FOR loop?

Question

I am looking at the source code for HashMap in Java 7, and I see that the put method will check if any entry is already present and if it is present then it will replace the old value with the new value.

    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

So, basically it means that there would always be only one entry for the given key, I have seen this by debugging as well, but if I am wrong then please correct me.

Now, since there is only one entry for a given key, why does the get method have a FOR loop, since it could have simply returned the value directly?

    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
            return e.value;
    }

I feel the above loop is unnecessary. Please help me understand if I am wrong.

The [wikipedia article on hash tables](https://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_linked_lists), especially the part on separate chaining collision resolution has a good illustration about how the values are stored internally in a linked list. You need to iterate through the list to reach the correct value because the list (the bucket) is only accessible through the hash code. — Mick Mnemonic, Feb 26 '18 at 12:59
Your analysis is incomplete: you overlooked that the `put` method *also* has a `for` loop. That should go some way towards answering your question. — Konrad Rudolph, Feb 26 '18 at 13:58
@KonradRudolph May be you didn't understand my question, that was my whole point. — pjj, Feb 26 '18 at 14:26
As an aside, I personally feel like the source code to `java.util` is not a particularly good resource to learn from. The code is extremely dense and subtle, with almost nothing in the way of comments and with very terse style. If you want to learn how a hashtable works, there are much better ways to learn that. And if you want to learn how classes in the standard library are *guaranteed* to behave (as opposed to how they *happen* to behave in a particular version) then you need to read the Javadocs instead of the code itself. — Daniel Pryden, Feb 26 '18 at 16:24
@DanielPryden agreed. The best way is to try and implement one yourself - or a simpler implementation of the same as a learning tool. Sooner or later, you'll run into the exact same problem that the `for` loop fixes, and you'll know why. — Baldrickk, Feb 26 '18 at 16:58
In `put`, right after the for-loop you showed there are some lines of code that creates a new entry with one of the parameters being the old value (which `next` is assigned to) and assigns that entry to the corresponding index. The for-loop only does something when it finds a matching key, not when it finds just a matching hash. — Bernhard Barker, Feb 26 '18 at 17:01
Possible duplicate of [How does a hash table work?](https://stackoverflow.com/questions/730620/how-does-a-hash-table-work) — Leonid Vasilev, Feb 27 '18 at 11:22
Also [What does “bucket entries” mean in the context of a hashtable?](https://stackoverflow.com/questions/9073903/what-does-bucket-entries-mean-in-the-context-of-a-hashtable) question is specifically about buckets. — Leonid Vasilev, Feb 27 '18 at 11:24

score 62 · Accepted Answer · answered Feb 26 '18 at 12:16

62

table[indexFor(hash, table.length)] is the bucket of the HashMap that may contain the key we are looking for (if it is present in the Map).

However, each bucket may contain multiple entries (either different keys having the same hashCode(), or different keys with different hashCode() that still got mapped to the same bucket), so you must iterate over these entries until you find the key you are looking for.

Since the expected number of entries in each bucket should be very small, this loop is still executed in expected O(1) time.

answered Feb 26 '18 at 12:16

Eran

387,369
54
702
768

My point is that in the `put` method, logic hashing and index logic of `get` is already done, and in the end old value is replaced with new value, if instead a new entry was added then all this makes sense, but since it doesn't so I am feeling that there is really no point of that FOR loop in `get`. I agree that loop will give `O(1)` – pjj Feb 26 '18 at 12:34
@pjj For a given key, there can be at most one entry in that bucket, but there may be entries with different keys in the same bucket. Therefore the loop is required until you find a key equal to the key you are looking for. – Eran Feb 26 '18 at 12:36
I think this is what I am not getting that how this is possible, let me see and try the code one more time. – pjj Feb 26 '18 at 12:38
7

@pjj different keys can have the same hashcode value, thus end-up in the same bucket (buckets are chosen based on the hash of the key) – Eugene Feb 26 '18 at 12:42
@pjj if you will see Prashant answer you will come to know how different keys can have same hashcode – Dhiraj Feb 26 '18 at 13:19
1

@pjj basically the case you are talking about is hashcode collision with different keys thats set – Dhiraj Feb 26 '18 at 13:20
@Eugene Can you give an example where there could be more than one entry for a given key? – pjj Feb 26 '18 at 14:31
@Eran Can you give an example where there could be more than one entry for a given key? – pjj Feb 26 '18 at 14:31
8

@pjj I never said there can be more than one entry for a given key. I said there can be more than one entry for a given bucket. – Eran Feb 26 '18 at 14:32
@pjj You are confusing normal hashing with [perfect hashing](https://en.wikipedia.org/wiki/Perfect_hash_function). You can obtain a set that uses perfect hashing, but you'd need to compute a special hash-function to avoid collisions. – Bakuriu Feb 26 '18 at 21:15

score 18 · Answer 2 · answered Feb 26 '18 at 13:01

18

If you see the internal working of get method of HashMap.

public V get(Object key)  {
        if (key == null)
           return getForNullKey();
         int hash = hash(key.hashCode());
         for (Entry<K,V> e = table[indexFor(hash, table.length)];e != null;e = e.next) 
         {
             Object k;
             if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                 return e.value;
         }
             return null;
}

First, it gets the hash code of the key object, which is passed, and finds the bucket location.
If the correct bucket is found, it returns the value (e.value)
If no match is found, it returns null.

Some times there may be chances of Hashcode collision and for solving this collision Hashmap uses equals() and then store that element into LinkedList in same bucket.

Lets take example:

Fetch the data for key vaibahv: map.get(new Key("vaibhav"));

Steps:

Calculate hash code of Key {“vaibhav”}.It will be generated as 118.
Calculate index by using index method it will be 6.
Go to index 6 of array and compare first element’s key with given key. If both are equals then return the value, otherwise check for next element if it exists.
In our case it is not found as first element and next of node object is not null.
If next of node is null then return null.
If next of node is not null traverse to the second element and repeat the process 3 until key is not found or next is not null.

For this retrieval process for loop will be used. For more reference you can refer this

answered Feb 26 '18 at 13:01

Prashant Patil

371
2
13

9

Please don't use JPG for non-photographic images. – Andreas Rejbrand Feb 26 '18 at 13:24
Can you give an example where there could be more than one entry for a given key? – pjj Feb 26 '18 at 14:31
2

@pjj Not the same *key*, but the same *hash* (or the same array index, even if not the same hash). The point of the answers here is that `HashMap` does not (**cannot**) magically contain a unique place to put every entry that will be `put` in the future. Keys which are `put` into the `HashMap` can have *hash-conflicts*, in which they want to be put in the same place in the structure even though they are not really the same key. There is a special way to resolve that when the entry is `put` into the map. On `get`, you have to account for that. – Loduwijk Feb 26 '18 at 17:22
@pjj consider the case where hascode collision with different keys occure – Dhiraj Feb 27 '18 at 04:59
@pjj in that case the second key-value pair will go to same bucket where already other key-value pair are present.But the case you are talking about will old entry replaced by new entry will occur when both keys are same.If keys are different and hashcode collision occur both entries will go to same bucket.That time hashmap will meintain link list to keep track of both entries – Dhiraj Feb 27 '18 at 05:01
@pjj when you will try to fetch from such bucket where there are more than one entries of key-value pair due to hash collision with different keys that time if the entry you are trying to fetch is present at the last or middle of the link list you need to traverse the linked list to reach out to the desired element.That's why there is loop in get method of hashmap – Dhiraj Feb 27 '18 at 05:09
@Dhiraj Yes, if I look at Eran and hagrawal answer then I get all my doubts clarified. – pjj Feb 27 '18 at 12:16

score 5 · Answer 3 · answered Feb 26 '18 at 12:30

For the record, in java-8, this is present also (sort of, since there are TreeNodes also):

if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }

Basically (for the case when the bin is not a Tree), iterate the entire bin, until you find the entry we are looking for.

Looking at this implementation you might understand why providing a good hash is good - so that not all entries end up in the same bucket, thus a bigger time to search for it.

score 4 · Answer 4 · edited Mar 04 '18 at 10:08

I think @Eran has already answered your query well and @Prashant has also made a good attempt along with other people who have answered, so let me explain it using an example so that concept becomes very clear.

Concepts

Basically what @Eran is trying to say that in a given bucket (basically at given index of the array) it is possible that there is more than one entry (nothing but Entry object) and this is possible when 2 or more keys give different hash but give the same index/bucket location.

Now, in order to put the entry in the hashmap, this is what happens at a high level (read carefully because I have gone the extra mile to explain some good things which are otherwise not part of your question):

Get the hash: what happens here is that first hash is calculated for a given key (notice that this is not hashCode, a hash is calculated using the hashCode and it is done as-as to mitigate the risk of poorly written hash function).
Get the index: This is basically the index of the array or in other words bucket. Now, why this index is calculated instead of directly using the hash as the index is because to mitigate the risk that hash could more than the size of the hashmap, so this index calculation step ensures that index will always be less than the size of the hashmap.

And when a situation occurs when 2 keys give different hash but the same index, then both those will go in the same bucket, and that is the reason that FOR loop is important.

Example

Below is a simple example I have created to demonstrate the concept to you:

public class Person {
    private int id;

    Person(int _id){
        id = _id;
    }

    public int getId() {
        return id;
    }
    public void setId(int id) {
        this.id = id;
    }

    @Override
    public int hashCode() {
        return id;
    }
}

Test class:

import java.util.Map;

public class HashMapHashingTest {
    public static void main(String[] args) {
        Person p1 = new Person(129);
        Person p2 = new Person(133);

        Map<Person, String> hashMap = new MyHashMap<>(2);
        hashMap.put(p1, "p1");
        hashMap.put(p2, "p2");
        System.out.println(hashMap);
    }
}

Debug screenshot (please click and zoom because it is looking small):

Notice, that in above example, both Person object gives different hash value (136 and 140 respectively) but gives the same index of 0, so both objects go in the same bucket. In the screenshot, you can see that both objects are at index 0 and there you have a next also populated which basically points to the second object.

Update: Another easiest way to see that more than one key is going into the same bucket is by creating a class and overriding the hashCode method to always return the same int value, now what would happen is that all the objects of that class would give the same index/bucket location but since you have not overridden the equals method so they would not be considered same and hence will form a list at that index/bucket location.

Another twist in this would suppose you override the equals method as well and compare all objects equal then only one object will be present at the index/bucket location because all objects are equal.

score 2 · Answer 5 · answered Feb 26 '18 at 18:02

While the other answers explain what is going on, OP's comments on those answers leads me to think a different angle of explanation is required.

Simplified example

Let's say you are going to toss 10 strings into a hash map: "A", "B", "C", "Hi", "Bye", "Yo", "Yo-yo", "Z", "1", "2"

You are using HashMap as your hash map instead of making your own hash map (good choice). Some of the stuff below will not use HashMap implementation directly but will approach it from a more theoretical and abstract point of view.

HashMap does not magically know that you are going to add 10 strings to it, nor does it know what strings you will be putting into it later. It has to provide places to put whatever you might give to it... for all it knows you are going to put 100,000 strings in it - perhaps every word in the dictionary.

Let's say that, because of the constructor argument you chose when making your new HashMap(n) that your hash map has 20 buckets. We'll call them bucket[0] through bucket[19].

map.put("A", value); Let's say that the hash value for "A" is 5. The hash map can now do bucket[5] = new Entry("A", value);
map.put("B", value); Assume hash("B") = 3. So, bucket[3] = new Entry("B", value);
map.put("C"), value); - hash("C") = 19 - bucket[19] = new Entry("C", value);
map.put("Hi", value); Now here's where it gets interesting. Let's say that your hash function is such that hash("Hi") = 3. So now hash map wants to do bucket[3] = new Entry("Hi", value); We have a problem! bucket[3] is where we put the key "B", and "Hi" is definitely a different key than "B"... but they have the same hash value. We have a collision!

Because of this possibility, the HashMap is not actually implemented this way. A hash map needs to have buckets that can hold more than 1 entry in them. NOTE: I did not say more than 1 entry with the same key, as we cannot have that, but it needs to have buckets that can hold more than 1 entry of different keys. We need a bucket that can hold both "B" and "Hi".

So let's not do bucket[n] = new Entry(key, value);, but instead let's have bucket be of type Bucket[] instead of Entry[]. So now we do bucket[n].add( new Entry(key, value) );

So let's change to...

bucket[3].add("B", value);

and

bucket[3].add("Hi", value);

As you can see, we now have the entries for "B" and "Hi" in the same bucket. Now, when we want to get them back out, we need to loop through everything in the bucket, for example, with a for loop.

So the looping is present because of the collisions. Not collisions of key, but collisions of hash(key).

Why do we use such a crazy data structure?

You might be asking at this point, "Wait, WHAT!?! Why would we do such a weird thing like that??? Why are we using such a contrived and convoluted data structure???" The answer to that question would be...

A hash map works like this because of the properties that such a peculiar setup provides to us due to the way the math works out. If you use a good hash function which minimizes conflicts, and if you size your HashMap to have more buckets than the number of entries that you guess will be in it, then you have an optimized hash map which will be the fastest data structure for inserts and queries of complex data.

Your HashMap may be too small

Since you say you are often seeing this for-loop being iterated over with multiple elements in your debugging, that means that your HashMap might be too small. If you have a reasonable guess as to how many things you might put into it, try to set the size to be larger than that. Notice in my example above that I was inserting 10 strings but had a hash map with 20 buckets. With a good hash function, this will yield very few collisions.

Note:

Note: the above example is a simplification of the matter and does take some shortcuts for brevity. A full explanation is even slightly more complicated, but everything you need to know to answer the question as asked is here.

Nikolay Lebedev · Answer 6 · 2018-02-26T15:15:13.057

1

Hash tables has buckets because hashes of objects do not have to be unique. If hashes of objects are equal, means, objects, probably, are equal. If hashes of objects are different, then objects are exactly different. Therefore, objects with the same hashes are grouped into buckets. The for loop is used to iterate objects contained in such a bucket.

In fact, this means that the algorithmic complexity of finding an object in such a hash table is not constant (although very close to it), but something between logarithmic and linear.

edited Feb 26 '18 at 15:15

answered Feb 26 '18 at 12:27

Nikolay Lebedev

346
1
9

1

Hash tables have constant complexity, not logarithmic. See question and answer [here](https://stackoverflow.com/q/9214353/509868). – anatolyg Feb 26 '18 at 15:14

score 0 · Answer 7 · answered Mar 01 '18 at 17:14

I would like to put it in simple words. the put method have a FOR loop to iterate over the list of keys which falls under the same bucket of hashCode.

What happens when you do put the key-value pair into the hashmap:

So for every key you pass to the HashMap, it will calculate the hashCode for it.
So many keys can fall under the same hashCode bucket. Now HashMap will check if the same key is already present or not in the same bucket.
In Java 7, HashMap maintains all the keys of the same bucket in a list. So before inserting the key it will traverse through the list to check if the same key is present or not. That's why there is a FOR loop.

So in average case its time complexity: O(1) and in worst case its time complexity is O(N).