11

I have a HashMap in my program which is accessed by multiple threads, and is occasionally set by a single thread.

For example:

Map<String, String> myMap = new HashMap<String, String>();

This is accessed by multiple threads. Once an hour, a single thread calls:

myMap = myRefreshedVersionOfTheMap;

So my question is whether or not this is thread safe. If both maps always have the key "importantKey", is it possible for a reading thread to ever access the map at a time when "importantKey" does not exist?

Edit:

Thanks to the answers, I've realized this question is actually independent of the HashMap. It was more a question about object reference assignment.

Randika Vishman
  • 7,983
  • 3
  • 57
  • 80
CorayThan
  • 17,174
  • 28
  • 113
  • 161
  • possible duplicate of [In Java, is it safe to change a reference to a HashMap read concurrently](http://stackoverflow.com/questions/7186808/in-java-is-it-safe-to-change-a-reference-to-a-hashmap-read-concurrently) – Raedwald Jun 21 '13 at 12:16

4 Answers4

11

This is not thread safe. Even though there are no writes to the map itself after the point of publication (from the point of view of the thread doing the publication), and reference assignment is atomic, the new Map<> has not been safely published. It particular, there are writes to the Map during its construction - either in the constructor, or after, depending on how you add those elements, and those writes may or may not be seen by other threads, since even though they intuitively occur before the map is published to the other threads, this isn't formally the case according to the memory model.

For an object to be safely published, it must be communicated to the outside world using some mechanism that either establishes a happens-before relationship between the object construction, the reference publication and the reference read, or it must use a handful of narrower methods which are guaranteed to be safe for publishing:

  • Initializing an object reference from a static initializer.
  • Storing a reference to it into a final field.

Your idiom would be safe if you declared myMap volatile. More details on safe publication can be found in JCIP (highly recommended), or here, or in this longer answer on a similar topic.

Community
  • 1
  • 1
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • +1, this is correct. I think one thing that drives this home is this line from the Wikipedia entry on the [Java Memory Model](http://en.wikipedia.org/wiki/Java_Memory_Model): "On modern platforms, code is frequently not executed in the order it was written". Your code may initialize the new map, THEN set the reference, but no guarantee the JVM will keep it that way!! – sharakan Jun 20 '13 at 18:45
  • You're saying I could have a method that that calls `Map myRefreshedVersionOfMap = null;` followed by `myRefreshedVersionOfMap = new HashMap();` `myMap = myRefreshedVersionOfMap;` but because code is frequently not executed in the order it was written, other threads accessing `myMap` could see it as `null` at some point in there? That makes no sense. This is all done in a single thread, single method. If I put values into the map before I assign `myMap` to be `myRefreshedMap` you're saying other threads could see the old version of the `myRefreshedMap` which existed before the assignment? – CorayThan Jun 20 '13 at 19:27
  • @CorayThan Not quite. myMap may have a non-null reference to the memory that's been allocated for the HashMap, but the HashMap constructor hasn't actually been run yet. If another thread accesses that memory location *after* that reference has been set, but *before* this thread runs the constructor, then Bad Things will happen. – sharakan Jun 20 '13 at 19:35
  • @CorayThan check out http://stackoverflow.com/questions/16213443/instruction-reordering-happens-before-relationship-in-java – sharakan Jun 20 '13 at 20:16
  • 3
    @CorayThan - not exactly. You don't even need to have ever set the value to null, and you don't need all this stuff with `myRefreshedVersionOfMap` (adding it makes no difference). I'm saying that if you initialize `myMap` at construction with say one key-value "foo=bar", and then later another thread set `myMap = otherMap`, where otherMap was created like `otherMap = new HashMap(); otherMap.put("foo", "dog")`, you are expecting that every thread either sees a map with foo=bar, or foo=dog. Actually however, you could see a map with no keys at all, a map with a null key, a map with foo=null. – BeeOnRope Jun 20 '13 at 23:20
  • 1
    ... or map which throws an NPE when you call get(), or a map that never returns from get (this happens frequently in real life, etc, etc). The naive concepts of "this happens before that" on a single thread don't apply directly when you talk about multiple threads. You can't reason about the order of events like they have some global serialized order. Two threads can see the same two events in the opposite order (even though you'd think that even if the order was random, it must have *some* fixed order - it doesn't). – BeeOnRope Jun 20 '13 at 23:22
  • 1
    Threads can see an "impossible" order, like an empty hash map even though you only set the global reference once it was non-empty etc. You can only reason about these things through the formal framework offered by the JMM. Again, I high recommend picking up JCIP - it's money well spent, but if you want a free option, start with [this post](http://jeremymanson.blogspot.com/2007/03/volatile-fields-and-synchronization.html) and work your way forward in time. Jeremy and co largely wrote the formal specification for the JMM, so you can consider it authoritative. – BeeOnRope Jun 20 '13 at 23:27
  • I've still not read JCIP, but I just finished Effective Java 2nd Ed., and item 66 contains an example that shows why `volatile` is required here, I believe. – CorayThan Jul 07 '14 at 09:33
  • @CorayThan - you can also check out [this answer](http://stackoverflow.com/a/41990379/149138) which goes into more detail and provides some useful links. – BeeOnRope Feb 02 '17 at 18:55
9

If you mean you are creating an entirely new Map and are assigning it to myMap which is what the other threads are accessing, then yes. Reference assignment is atomic. It's threadsafe because you are not modifying the contents of a Map while other threads are reading from it - you just have multiple threads reading from a Map.

You just need to declare it volatile so other threads don't cache it.

Brian Roach
  • 76,169
  • 12
  • 136
  • 161
  • In this case the map is a property of Spring Singleton-scoped bean, so I don't believe it could have caching issues like that, right? – CorayThan Jun 20 '13 at 17:33
  • 2
    Even if the class that contains the `Map` is a singleton you need `volatile`; it being a singleton just means there's (guaranteed) only one instance of that class shared among your threads. That's no different than multiple threads accessing a single instance of any other class. When you assign that new `Map` other threads wouldn't see it otherwise. – Brian Roach Jun 20 '13 at 17:47
  • 3
    This is not correct. Reference assignment being "atomic" just means that threads can't see a bogus reference (pointing nowhere - i.e., violating the security model of the JVM). It does not guarantee that the Map<> pointed to by the reference is fully constructed. Another thread could see the reference to the new map, but then access a Map<> which is only partly constructed, with unknown effects. `volatile` is required here for safe publication, not just to avoid caching. – BeeOnRope Jun 20 '13 at 17:54
  • 1
    SO you mean, exactly what my answer says? Oh no, wait, you're just being pedantic over *why* you need volatile. Carry on. – Brian Roach Jun 20 '13 at 17:55
  • 1
    Huh? Your entire answer is wrong - the reasoning in the first paragraph is completely wrong, it is *not safe*. – BeeOnRope Jun 20 '13 at 18:01
  • 2
    You mentioned volatile incidentally to avoid other threads caching it - in other words it's still safe but you might see stale values due to caching. This is not correct - it is completely unsafe regardless of stale values or not. Your program can terminate, behave exceptionally, go into an infinite loop, etc. – BeeOnRope Jun 20 '13 at 18:02
  • In particular, your answer to the second question posed by the OP: "If both maps always have the key "importantKey", is it possible for a reading thread to ever access the map at a time when "importantKey" does not exist?" is "No", but mine is "Yes, plus it may have other arbitrary bad behavior". – BeeOnRope Jun 20 '13 at 18:04
  • FWIW, here's a nearly [identical question](http://stackoverflow.com/q/7186808/149138), and the selected answer, and every other answer, is also wrong wrt to the behavior if volatile is missing. It's a subtle point and one that is often missed. The atomicity of reference writes doesn't guarantee anything about the contents of the the thing the reference points to. – BeeOnRope Jun 20 '13 at 18:07
  • 1
    When I said "ever possible", I did not mean to include the situation where cosmic rays flip the bits and make the map null, or other equally drastic situations like program termination. – CorayThan Jun 20 '13 at 18:08
  • 1
    Don't worry, I wasn't talking about that either. In a normally behaving JVM, it's totally possible for your program to read the map and not see that key, even though every map assigned to that reference has always had that key. It's subtle, but it's critical. See my reply below for the details. You can, in fact, reproduce this or similar safe publication issues locally - no cosmic rays needed. – BeeOnRope Jun 20 '13 at 18:12
  • @BeeOnRope is right about this. If the instructions in your code ran in the order it was written, you'd be fine. But there's no guarantee that it will. Volatile or another synch construct is necessary to make sure the Map initialization `happens-before` the reference assignment. – sharakan Jun 20 '13 at 18:47
  • @BeeOnRope though you are right, volatile write enforces two kinds of visibilities at the same time - (a) the write itself and (b) the writes before it, so it's no accident that people targeting (a) get the benefit of (b) too. – ZhongYu Jun 20 '13 at 18:52
  • 1
    Agreed, but it's important to understand the distinction, because there are cases where you don't care about visibility, but you always care about correctness. It is plain wrong to say the above is safe, except for visibility concerns. In fact, the OP specifically draws the distinction between visibility and safety when he asks about "importantKey". One reason not to care about visibility is when you know there will eventually be some other synchronization action between the threads, so the period during which you can see stale values is bounded. – BeeOnRope Jun 20 '13 at 23:13
  • 1
    Another reason you might care about this is because you need to know the guarantees around *un*safe publishing and immutable objects, e.g., for security reasons (String is designed specifically to mitigate these issues, for example). – BeeOnRope Jun 20 '13 at 23:29
  • @CorayThan - i hope you understand that using volatile is critical to correctness for this solution. – jtahlborn Jun 21 '13 at 12:00
  • Well looks like the OP was just looking for an answer which confirmed his intuitive understanding of how it should work, rather than the truth. Shows some of the risks of talking SO answers as gospel (although I make the same mistake). For sure anyone reading this ?n will just look at the first answer and thing "oh, it's safe w/o volatile if I don't care about stale values". – BeeOnRope Jun 21 '13 at 18:24
  • 1
    @CorayThan You are missing the point, the risk is not *a tiny chance for inconsistency*, you might get a completely broken map that throws exceptions that should never have been thrown. `volatile` does not increase complexity and hardly makes a difference from a performance perspective. The map *must* be volatile in this example. – assylias Jul 30 '13 at 16:24
  • For what it's worth, if you are ok with a _tiny_ chance for inconsistency, you should state that up front in your question (and, basically, you'll spend the next few days defining _tiny_) and not ask black and white questions such as "is this threadsafe". Something with any chance of inconsistency (and actually, as @assylias points out, that covers the entire spectrum of undefined behavior) is by definition not threadsafe. – BeeOnRope Oct 09 '13 at 06:38
0

First off, Java's HashMap class is not thread safe, so there are no guarantees when reads and writes are happening concurrently.

However, since reads and writes to references in Java are atomic, then the pattern you described could be thread-safe as long as the refresh code is not mutating the old map. For example, the following would be fine:

// this refresh code would be thread-safe
Map<String, String> copy = new HashMap<String, String>(myMap);
copy.put(x, y); // change the map
myMap = copy;

// you might also consider
myMap = Collections.unmodifiableMap(copy);
// to make sure that the non-thread-safe map will never be mutated

One thing to consider with this pattern is that you may want the myMap field to be declared as volatile so that all threads will get the most recent version of myMap whenever they read from that variable.

Finally, as other posters have mentioned ConcurrentHashMap may be a better approach depending on the complexity of the refresh code. One disadvantage of ConcurrentHashMap is that it doesn't offer any way to batch the operations, so you'd have to make sure that the state at every point during the refresh process was valid for the rest of your application to consume.

ChaseMedallion
  • 20,860
  • 17
  • 88
  • 152
  • This is essentially what I did. In the refresh method I fully construct a valid replacement map, then assign it in a single operation. – CorayThan Jun 20 '13 at 17:43
  • 1
    This is incorrect. Even though there are no reads or writes, the new HashMaps are not *safely published* to the outside world. Reading threads can see new HashMap maps in an indeterminate state. – BeeOnRope Jun 20 '13 at 17:57
  • @BeeOnRope: while I could see that being true for an arbitrary data structure, I'm not convinced that that is true of java's HashMap. The documentation specifically calls out the need for synchronization in the case of mutating the map: "If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally" (from http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html) – ChaseMedallion Jun 20 '13 at 21:07
  • That documentation is just explaining that HashMap isn't thread safe in general (e.g., unlike HashTable). It doesn't talk about safe publication and it's not a substitute for an understanding of that concept. The safe publication rules are universal and apply to all classes, HashMap included. It's also worth noting that this [text](http://docs.oracle.com/javase/1.4.2/docs/api/java/util/HashMap.html) existed even before the memory model was introduced in Java 5. I'd also not that the comments in the javadoc are establishing a necessary condition for thread safety, but not a sufficient one. – BeeOnRope Jun 20 '13 at 23:10
-1

HashMap is not thread safe. You can use any of the followings

  1. ConcurrentHashMap.
  2. HashMap with synchronized on the outside.
  3. Different HashMap for each thread.

Check this similar answer here

Community
  • 1
  • 1
stinepike
  • 54,068
  • 14
  • 92
  • 112
  • 1
    That doesn't appear to be what he's asking; he doesn't have concurrent read/writes, only reads and a reference assignment. – Brian Roach Jun 20 '13 at 17:35