4

I recently encountered a class on our code base that extends HashMap and synchronizes the put method.

Aside from it being less efficient than using ConcurrentHashMap, what sort of problems may arise with extending HashMap and synchronizing only put(K,V)?

Assume that we don't care whether get(K) returns the latest value or not(e.g. we are fine with threads overwriting each other and we don't care about the possible race conditions that may arise if the values of the map are used as locks themselves).

Example:

public class MyMap<K,V> extends HashMap<K,V> {
  //...
   public synchronized void put(K key, V value) {
      //...
   }

  //...
}

As far as I know, HashMap does its re-size with the put method and since the put is synchronized at the map instance level, the problems encountered during concurrent re-sizing will(probably) not be encountered.

Even with the questionable assumptions above, my gut-feeling is telling me that there are more problems that may arise. Or am I just being paranoid?

Update: Thank you all, that was fun and enlightening. If I ever meet the original of author of this particular class, I can now explain his folly in detail. :)

In summary: putAll can still screw up the data structure horribly and end up in the dreaded infinite-loop/data-race condition. get relies on the underlying internal data structures of hashmap that may be in the process of being modified concurrently causing the get process to behave weirdly. This is just a generally bad idea. At the very least, the author could have used Collections.synchronizedMap(Map) instead.

Note: All three answers given as of this writing are actually correct but I picked the one about get() as the correct answer since it was the least obvious one for me.

Hyangelo
  • 4,784
  • 4
  • 26
  • 33
  • But hashtable is synched like this. – huseyin tugrul buyukisik Sep 14 '12 at 18:29
  • Hashtable is fully synched (puts and gets). – Gilberto Torrezan Sep 14 '12 at 18:30
  • So, synched gets would make it even slower – huseyin tugrul buyukisik Sep 14 '12 at 18:30
  • Yeah, hash table has both put and get synchronized so it is locked down. It is slow but it is thread-safe. – Hyangelo Sep 14 '12 at 18:31
  • 1
    Is there any indication in the source code as to what the goal of synchronizing the `put(key, value)` method was? If it is to avoid two threads modifying the map structurally at the same time without synchronizing all operations, then another write operation was missed - `putAll(Map)` – matt b Sep 14 '12 at 18:46
  • Well I assume that whoever the original author was tried using just a simple HashMap and then encountered ConcurrentModificationException errors and then wised up a bit and did this. Also, you are right! putAll can screw this up since a resize can happen! – Hyangelo Sep 14 '12 at 18:49
  • @mattb you can submit the part about the putAll as an answer since you are correct that putAll opens up the problem. – Hyangelo Sep 14 '12 at 18:50
  • ConcurrentModificationException is more often than not generated by one thread, not multiple threads, in which case extra synchronization would be useless. – jtahlborn Sep 14 '12 at 18:54
  • Oops, you are right, HashMap has no way of detecting if it is being modified by more than one thread. In that case, I am not sure why the original author got this. That was pure speculation on my part. I'm merely the maintainer. :P – Hyangelo Sep 14 '12 at 18:58

3 Answers3

5

I hope you are also synchronizing on putAll and remove. putAll especially since multiple threads can try and resize your HashMap. Those methods too will be updating size and modCount which if done outside of synchronization can result in lost updates.

John Vint
  • 39,695
  • 7
  • 78
  • 108
3

since get() can be reading a changing data structure, everything bad can happen.

I've seen get() trapped in dead loop, so it's not just a theoretical possibility, bad things do happen.

irreputable
  • 44,725
  • 9
  • 65
  • 93
  • As stated, that is what my instincts are telling me but given the assumptions stated in my questions(no visibility guarantees needed, values stored on the map will not be used as locks themselves, we are ok with values getting overwritten), what are the things that may go wrong? – Hyangelo Sep 14 '12 at 18:39
  • 2
    @Hyangelo - you're misunderstanding. besides the (missing) visibility guarantees of the values, there is no guarantee of any consistency of the _HashMap_ internals. for instance, you could see an internal table full of null values. or any other internal object could have spurious null values, etc... – jtahlborn Sep 14 '12 at 18:47
  • 1
    @Hyangelo - also note that the visibility problems with the values themselves is not _just_ the possibility of seeing stale values, it also means the internal state of the value objects could be messed up (unless alternate synch/volatile is used for their internal state). – jtahlborn Sep 14 '12 at 18:52
  • You are right, when a get is done for a key that experienced a collision, it might have to traverse some sort of a list and then bad things can happen as you said. Thanks. – Hyangelo Sep 14 '12 at 18:54
  • @Hyangelo I've seen get() trapped in dead loop, so it's not just a theoretical possibility, bad things do happen. – irreputable Sep 14 '12 at 19:44
  • @irreputable Yep, I understand it now with the help of the other comments above. It would be helpful for future readers of this question if you can expound on how get can run into trouble(e.g. with the Entry array being modified by write operations while get is interacting with it). – Hyangelo Sep 14 '12 at 20:09
2

As I mentioned in the comments, another problem that can arise is that the putAll(Map) method does not appear to be synchronized. Since putAll can also modify the structure of the Map, it is unsafe to call it unsynchronized from one thread while another thread is using the same Map.

At a higher-level though, it'd be interesting to understand more of the why around why put(key, value) was synchronized. Even if you've now guarded against unsynchronized modifications to the map's structure, it still does not seem like a good idea for multiple threads to be accessing the map without synchronization. In fact, if thread A is attempting to iterate over the HashMap's contents, and thread B calls synchronized put(key, value), the iterator in Thread A will still fail fast and throw a ConcurrentModificationException rather than do something non-deterministic.

Even if you were to synchronize the putAll(Map) call, other threads iterating over the map's contents will still see exceptions. If the Map needs to be used in multiple threads and at least one of those threads needs to modify the Map, all of the calls need to be synchronized, period.

matt b
  • 138,234
  • 66
  • 282
  • 345
  • Good point on `ConcurrentModificationException` though the resolution point would be the same as `Collections.synchronizedMap` – John Vint Sep 14 '12 at 19:01