5

I'm aware of the ConcurrentHashMap in Java and its numerous benefits but it's not so clear to me why an implementation like a synchronised HashMap needs to be synchronized on every function call.

To me, it feels like if you had a HashMap whose only functions are put(k, v) and get(k), then only the put function would need to be synchronized, because even if you were to resize the hashMap after calling put, then you can still grant a safe read access while performing a resize. The reader thread could simply read from the before resized version. When the resize is complete the writer thread would replace the reference so that all current calls to get will point to the resized version of the HashMap.

Am I missing something obvious?

jeremie
  • 971
  • 9
  • 19
  • 3
    That would make put() extremely inefficient: it would have to make a copy of the whole HashMap, and then use the copy when it's done. That's the strategy used by CopyOnWriteArrayList, but it's inefficient if you have many writes to the list. – JB Nizet Nov 04 '17 at 11:54
  • That's not how synchronisation works. You need to synchronise the write AND the read, otherwise a read could see stale, or worse, inconsistent values. – assylias Nov 04 '17 at 12:13
  • @assylias, I can see why you would get a stale value. But the inconsistent value, to me, can only happen, if the put operation is performing a mutation on an already existing entry. Is that right? – jeremie Nov 04 '17 at 12:29
  • No, imagine thread 1 creates an object with "new" and puts it into the map. Then thread 2 gets the object from the map - it is totally possible that it could see a valid reference to a partially constructed object. – assylias Nov 04 '17 at 14:05
  • You're just assuming that after any put call the hashmap goes atomically from being consistent without the element to be consistent with the new element. This is not true and your not synchronized 'read thread' may try to read an inconsistent data. – Luca Reccia Nov 06 '17 at 15:13
  • 1
    @D.Rek, Interesting, I did assumed this part was atomic. Thanks – jeremie Nov 06 '17 at 19:59
  • 1
    In fact assignment of objects is atomic. Check my answer for details. – Duarte Meneses Nov 06 '17 at 23:24

2 Answers2

3

There are a couple of reasons why HashMap cannot be thread safe without some form of synchronization.

Hash collisions

HashMap handles collisions by creating a linked list in the node. If the size of the list grows past a certain threshold, it transforms it into a tree. This means that in a multithreaded environment, you could have 2 threads writing and reading to a tree or to a linked list, which will obviously be problematic.

One solution would be to re-create the entire linked list or tree in parallel and set it at the hash map node when it's ready (this is basically a "copy-on-write" strategy), but apart of the horrible performance, you would still need to use a lock when putting data (for the same reason as explained in the next point).[1]

Assigning a new reference to the table

You mentioned that for resizes, we could use the "copy-on-write" strategy and once the new hash table is ready, we set it atomically to the reference of the table. This indeed works in the new Java Memory Model: [2] if the object is immutable (final member variables):

An object is considered to be completely initialized when its constructor finishes. A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields.

However, you would still need to synchronize the entire creating of the new hashtable based on the old hashtable. That's because you could have two racy threads resizing the hashtable concurrently, and you would essentially loose one of the put operations.

This is what is done in the CopyOnWriteArrayList, for example.


1Note that this lock could have a much smaller granularity than the entire table. In the limit, you could have a separate lock for every node. Using different locks for different sections of a hashtable is a common strategy to achieve thread safety while minimizing thread contention.

2Another side note: assignment of references to objects and primitive types are always atomic except for long and double. For these types which are 64bit, a racing thread could see only one of the words of 32bit written. You need to make them volatile to guarantee atomicity.

Duarte Meneses
  • 2,868
  • 19
  • 22
1

What you are describing is a form of snapshot semantics for reads. Given the underling implementation of your map does support such semantics then there is no need to synchronize read access. However, you cannot assume this to hold for any map implementation.

Specifically: a map might chose to implement its buckets by separate chaining. In this case you might end up reading from a bucket whose list is in the middle of being updated when not synchronizing read access. In the worst case when the underlying list is implemented as linked list such unsynchronized access can even lead to the reading thread getting stuck in an infinite loop.

michid
  • 10,536
  • 3
  • 32
  • 59