There are a couple of reasons why HashMap
cannot be thread safe without some form of synchronization.
Hash collisions
HashMap
handles collisions by creating a linked list in the node. If the size of the list grows past a certain threshold, it transforms it into a tree. This means that in a multithreaded environment, you could have 2 threads writing and reading to a tree or to a linked list, which will obviously be problematic.
One solution would be to re-create the entire linked list or tree in parallel and set it at the hash map node when it's ready (this is basically a "copy-on-write" strategy), but apart of the horrible performance, you would still need to use a lock
when putting data (for the same reason as explained in the next point).[1]
Assigning a new reference to the table
You mentioned that for resizes, we could use the "copy-on-write" strategy and once the new hash table is ready, we set it atomically to the reference of the table. This indeed works in the new Java Memory Model: [2] if the object is immutable (final member variables):
An object is considered to be completely initialized when its constructor finishes. A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields.
However, you would still need to synchronize the entire creating of the new hashtable based on the old hashtable. That's because you could have two racy threads resizing the hashtable concurrently, and you would essentially loose one of the put
operations.
This is what is done in the CopyOnWriteArrayList
, for example.
1Note that this lock could have a much smaller granularity than the entire table. In the limit, you could have a separate lock for every node. Using different locks for different sections of a hashtable is a common strategy to achieve thread safety while minimizing thread contention.
2Another side note: assignment of references to objects and primitive types are always atomic except for long
and double
. For these types which are 64bit, a racing thread could see only one of the words of 32bit written. You need to make them volatile to guarantee atomicity.