2

I'm trying to create a collision intentionally.

 fun main(args: Array<String>) {
   val india = Country("India1", 1000)
   val india2 = Country("India2", 1000)

   val countryCapitalMap: HashMap<Country, String> = hashMapOf()
   countryCapitalMap.put(india, "Delhi1")
   countryCapitalMap.put(india2, "Delhi2")
 }


class Country(var name: String, var population: Long) {

  override fun hashCode(): Int {
      return if (name.length % 2 == 0) 31 else 95
  }

  override fun equals(obj: Any?): Boolean {
      val other = obj as Country?
      return if (name.equals(other!!.name, ignoreCase = true)) true else false
  }
}

So, I have india and india2 objects. I've overridden equals() and hashCode() methods for Country so that:

  • india.hashCode() == india2.hashCode() --> true
  • india.equals(india2) --> false

According to Collision resolution in Java HashMap and part of the article "Lets put this Country objects in hashmap", if key1 has result of hash(key.hashCode()) equal to the same operation on the key2 then there should be collision.

So, I put breakpoint to see content of countryCapitalMap and see that its size is 2. I.e. it contains two different entries and there is no linkedList. Hence, there is no collision.

My questions are:

Why countryCapitalMap has a size of 2? Why is there no collision?

Why doesn't HashMap creates a LinkedList with two entries with keys that is not equal but have the same hashCode?

gooDiVer
  • 33
  • 4
  • 4
    You have a collision, both entries are stored in the same bin, but you won't see it unless you debug the code of HashMap itself. – Alex Sveshnikov May 15 '22 at 14:58

1 Answers1

13

You are confusing a collision - case when hashes of keys are the same (or to be precise, when hashes are withing the range corresponding to the same bucket of HashMap), with duplicated keys (i.e. keys that are identical according to the equals() method). These are two completely different situations.

Under the hood, HashMap maintains an array of buckets. Each bucket corresponds to a range of hash values.

If hashes of the keys collide (but keys are not equal) entries (Nodes to be precise) will be mapped to the same bucket and form a linked list, which after a certain threshold will turn into a tree.

Conversely, an attempt to add a new entry with a key that is already present in a map (i.e. duplicated key according to the equals() method) will result in updating the value of the existing entry.

Because as you've already observed, india.equals(india2) will return false, HashMap will contain two entries.

And since the hashes of india and india2 are identical, you've succeeded in your intention to create collision. Both entries will be added, both will end up in the same bucket (i.e. their nodes will form a linked list).

If you wonder how exactly this linked list looks like, take a look at the source code of the HashMap class. And you'll find that there's a field Node<K,V>[] table - an array of buckets, and there's an inner class Node:

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}

And each node holds a reference next pointing to the next node mapped to the same bucket (if any).

Sidenotes :

  • The hash-function that you've provided in the example is a bad one. Sure, that's clear that you did that on purpose (to ensure that two keys would collide). But it's important to point out that the proper implementation of the equals/hashCode contract is crucial for every object that is intended to be used with Collections framework. And for hash-based collections like HashMap and HashSet the implementation of hashCode() is significant to perform well. If hash-function generates a lot of collisions, as a consequence many entries could appear in the same buckets meanwhile a large number of buckets could remain unoccupied. And depending on a load factor (ratio between the occupied and total number of buckets) your collection might never get resized which will lead to degradation of performance.
  • Another thing related to hashes that worth to mention is that the hashCode() of the key will be invoked only once, while a new node gets created. Take a look at the code above, and you'll notice that the field hash is marked as final. That's done on purpose, because computing hash some objects can be costful. Which means that if the state of this key changes a HashMap will be unaware of that change and will be unable to recognize the same key, i.e. if the previous and new hash will differ equals() method would not be invoked and HashMap will allow the same key to appear twice. That leads to another rule of thumb: an object that is used as a key should be immutable.
Alexander Ivanchenko
  • 25,667
  • 5
  • 22
  • 46
  • May be worth adding that while the current implementation of `java.util.HashMap` may use linked lists to store chains of elements with the same hash code, that's not the only way to implement a hash table; it could store chains in various other data structures, or it could store all entries directly in the hash table and use various types of probing to locate elements. Wikipedia has [more info on this](https://en.wikipedia.org/wiki/Hash_table#Collision_resolution). But this is all just implementation detail: it may affect performance, but the user will always see the same results. – gidds May 16 '22 at 21:31