You are confusing a collision - case when hashes of keys are the same (or to be precise, when hashes are withing the range corresponding to the same bucket of HashMap
), with duplicated keys (i.e. keys that are identical according to the equals()
method). These are two completely different situations.
Under the hood, HashMap
maintains an array of buckets. Each bucket corresponds to a range of hash values.
If hashes of the keys collide (but keys are not equal) entries (Node
s to be precise) will be mapped to the same bucket and form a linked list, which after a certain threshold will turn into a tree.
Conversely, an attempt to add a new entry with a key that is already present in a map (i.e. duplicated key according to the equals()
method) will result in updating the value of the existing entry.
Because as you've already observed, india.equals(india2)
will return false
, HashMap
will contain two entries.
And since the hashes of india
and india2
are identical, you've succeeded in your intention to create collision. Both entries will be added, both will end up in the same bucket (i.e. their nodes will form a linked list).
If you wonder how exactly this linked list looks like, take a look at the source code of the HashMap
class. And you'll find that there's a field Node<K,V>[] table
- an array of buckets, and there's an inner class Node
:
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
V value;
Node<K,V> next;
}
And each node holds a reference next
pointing to the next node mapped to the same bucket (if any).
Sidenotes :
- The hash-function that you've provided in the example is a bad one. Sure, that's clear that you did that on purpose (to ensure that two keys would collide). But it's important to point out that the proper implementation of the
equals/hashCode
contract is crucial for every object that is intended to be used with Collections framework. And for hash-based collections like HashMap
and HashSet
the implementation of hashCode()
is significant to perform well. If hash-function generates a lot of collisions, as a consequence many entries could appear in the same buckets meanwhile a large number of buckets could remain unoccupied. And depending on a load factor (ratio between the occupied and total number of buckets) your collection might never get resized which will lead to degradation of performance.
- Another thing related to hashes that worth to mention is that the
hashCode()
of the key will be invoked only once, while a new node gets created. Take a look at the code above, and you'll notice that the field hash
is marked as final
. That's done on purpose, because computing hash some objects can be costful. Which means that if the state of this key changes a HashMap
will be unaware of that change and will be unable to recognize the same key, i.e. if the previous and new hash will differ equals()
method would not be invoked and HashMap
will allow the same key to appear twice. That leads to another rule of thumb: an object that is used as a key should be immutable.