Hash collisions in Java's ConcurrentHashMap: which precautions must be taken?

Question

I'm having a bit of an issue understanding how hashes (or rather: hash collisions) are handled by Java's ConcurrentHashMap.

For example, when I'm running each:

"Ea".hashCode()
"FB".hashCode()

... I am presented with the same integer twice: 2236

Now, if instead I use each of these strings as a key in an instance of ConcurrentHashMap, there seem to be no issues.

ConcurrentHashMap<String, Object> testMap = new ConcurrentHashMap<>();
testMap.put("Ea", Math.random());
System.out.println(testMap.containsKey("FB"));

The last line returns false, despite a key with the same hash code value ("Ea") already being present in the map.

How does this work, and more importantly: what precautions do I need to take to prevent hash collisions in my instances of ConcurrentHashMap?

The hashes are used for bucketing, within a bucket `equals` checks are performed. Hash collisions should be avoided but cannot be prevented (see pigeonhole principle). — luk2302, Jan 12 '22 at 12:39
Excuse my bad phrasing! By "prevent", i am asking for a sure-fire way to check for collisions before adding something to the map. I am working on configuration-file based content which is stored in such maps, and i wish to have a way to stop the proccess and warn the user if they used hash-code-equal keys. — jaylawl, Jan 12 '22 at 12:44
Why would you want to? I think you need an experiment. Try defining your own type that hashes everything to zero but has a sensible equals(). Does hashmap still work with it? (spoiler: it should, only slightly slower) Hash collisions are perfectly natural and the library is prepared to deal with them. IOW: collisions only affect performance, not correctness — , Jan 12 '22 at 12:46
Why? That is not your problem to solve. Duplicate hashes are not a problem for you, the end user or the config person. — luk2302, Jan 12 '22 at 12:46
That doesn't make sense. You may have many keys that generate the same hashcode. As far as I know you can't determine for any map, if a given "bucket" has been "created" for housing collisions. All you should worry about are keys that compare equally using `equals()` because then the new value would replace the old one. And there are methods for detect existing keys. But there are also ways for saving values that have identical keys. — WJS, Jan 12 '22 at 12:47
Remember, all the `hashcode` does is provide a head start in looking up a value. It is not the ultimate decider of which value to return. That is still the comparison of the `provided key` to the `key` in the`key/value` pair in the bucket. — WJS, Jan 12 '22 at 12:56
`Why? That is not your problem to solve. Duplicate hashes are not a problem for you, the end user or the config person` That was the motivation for my question. I wasn't aware that hash maps handle this stuff on their own — jaylawl, Jan 12 '22 at 13:00

score 2 · Accepted Answer · answered Jan 12 '22 at 12:43

Javas ConcurrentHashMap uses a HashTable as the underlying data structure to store entries. The way this table deals with collisions is described here: How do HashTables deal with collisions?

Generally you don't need to be concerned about hash collision when using the HashMap and ConcurrentHashMap types of the standard library. Those are guaranteed to not cause problems with keys that have the same hash values.

Hash collisions in Java's ConcurrentHashMap: which precautions must be taken?

1 Answers1