33

Is ConcurrentHashMap.get() guaranteed to see a previous ConcurrentHashMap.put() by different thread? My expectation is that is is, and reading the JavaDocs seems to indicate so, but I am 99% convinced that reality is different. On my production server the below seems to be happening. (I've caught it with logging.)

Pseudo code example:

static final ConcurrentHashMap map = new ConcurrentHashMap();
//sharedLock is key specific.  One map, many keys.  There is a 1:1 
//      relationship between key and Foo instance.
void doSomething(Semaphore sharedLock) {
    boolean haveLock = sharedLock.tryAcquire(3000, MILLISECONDS);

    if (haveLock) {
        log("Have lock: " + threadId);
        Foo foo = map.get("key");
        log("foo=" + foo);

        if (foo == null) {
            log("New foo time! " + threadId);
            foo = new Foo(); //foo is expensive to instance
            map.put("key", foo);

        } else
            log("Found foo:" + threadId);

        log("foo=" + foo);
        sharedLock.release();

    } else
        log("No lock acquired");
} 

What seems to be happening is this:

Thread 1                          Thread 2
 - request lock                    - request lock
 - have lock                       - blocked waiting for lock
 - get from map, nothing there
 - create new foo
 - place new foo in map
 - logs foo.toString()
 - release lock
 - exit method                     - have lock
                                   - get from map, NOTHING THERE!!! (Why not?)
                                   - create new foo
                                   - place new foo in map
                                   - logs foo.toString()
                                   - release lock
                                   - exit method

So, my output looks like this:

Have lock: 1    
foo=null
New foo time! 1
foo=foo@cafebabe420
Have lock: 2    
foo=null
New foo time! 2
foo=foo@boof00boo    

The second thread does not immediately see the put! Why? On my production system, there are more threads and I've only seen one thread, the first one that immediately follows thread 1, have a problem.

I've even tried shrinking the concurrency level on ConcurrentHashMap to 1, not that it should matter. E.g.:

static ConcurrentHashMap map = new ConcurrentHashMap(32, 1);

Where am I going wrong? My expectation? Or is there some bug in my code (the real software, not the above) that is causing this? I've gone over it repeatedly and am 99% sure I'm handling the locking correctly. I cannot even fathom a bug in ConcurrentHashMap or the JVM. Please save me from myself.

Gorey specifics that might be relevant:

  • quad-core 64-bit Xeon (DL380 G5)
  • RHEL4 (Linux mysvr 2.6.9-78.0.5.ELsmp #1 SMP ... x86_64 GNU/Linux)
  • Java 6 (build 1.6.0_07-b06, 64-Bit Server VM (build 10.0-b23, mixed mode))
Stu Thompson
  • 38,370
  • 19
  • 110
  • 156
  • Your function get's a parameter called "sharedLock" but you use a member called "lock" ("sharedLock" isn't used at all). Is this intentional? – Arne Deutsch Nov 20 '09 at 12:40
  • This is the point where I'd get paranoid. In addition to logging the last foo right before releasing the lock, also log map.get( "key" ) and the identity hash code for the map. On quick read through, it certainly all looks ok and in fact the lock should be enough if the map is not modified elsewhere. Perhaps there is something significant that didn't make it through the stripping for the question? – PSpeed Nov 20 '09 at 12:45
  • @Arne: no, just a typo. Will fix. – Stu Thompson Nov 20 '09 at 12:47
  • 1
    Make a proxy of (or simply extend) ConcurrentHashMap and log the timing and arguments for get() and put(), this could shed some more light – Bozho Nov 20 '09 at 12:51
  • @PSpeed: The map is occasionally modified elsewhere, but not on that key. (The lock is key specific.) **Could that be a mitigating factor?** If the lock were global, then I might as well synchronize on the map, which is something one should not have to do with `ConcurrentHashMap`. – Stu Thompson Nov 20 '09 at 12:52
  • I'm willing to trust ConcurrentHashMap more than I'm willing to trust, say, "log()"; or how you've created your threads. So I'm guessing that there's some interaction that you're not accounting for. Please post an entire SCSCE; you can use Object rather than Foo and you'll get the same results. – kdgregory Nov 20 '09 at 13:00
  • The javadocs for ConcurrentHashMap state that "retrieval operations do not block". Do you get the same results if you lower the concurrency level? How about substituting in a Collections.synchronized map to check that your test is correctly designed? – Steve B. Nov 20 '09 at 13:01
  • 4
    Oh, and when you write something, simply use System.err.println() -- note that it's System.err, not System.out; the latter is buffered, the former isn't. – kdgregory Nov 20 '09 at 13:01
  • @Steve B.: I've tried a concurrency level of 1, as stated in the question. I am loath to use `Collections.synchronized` with `ConcurrentHashMap` because I just should not have to, CHM is a replacement for that approach. – Stu Thompson Nov 20 '09 at 13:05
  • **Regarding logging:** Foo is so ridiculously expensive to create *(seconds!)* that I do not believe there is an issue there. Regardless, thread 2 should get have foo@cafebabe420 with the get, and not create a new instance--the order of log entries won't matter. – Stu Thompson Nov 20 '09 at 13:08
  • 3
    @Stu, this has nothing to do with your problem, but I'd strongly advise you to put the `sharedLock.release()` call inside of a `finally` block. If an exception occurs between the `tryAcquire` and the `release` calls, the semaphore will end up locked forever. – gustafc Nov 20 '09 at 13:28
  • I don't think it could cause the behaviour you're seeing, but the foo that you put in the map is not the foo whos address you print out. You've got two calls to new Foo() in there. If you didn't want to override CHM, you could add some logging to the method - how long it waits for the lock, what the locks address is, what the CHMs address is etc. – MHarris Nov 20 '09 at 13:29
  • @gustafc: I do. For communication purposes, I've trimmed down the code to an absolute minimum pseudo code. But thanks anyway. :) – Stu Thompson Nov 20 '09 at 13:38
  • @MHarris: good catch! that is an error in my pseudo code. Correcting now. – Stu Thompson Nov 20 '09 at 13:42
  • Hm, why don't you upload somewhere an archive with your code, and give a link here, so that we could experiment. – Bozho Nov 20 '09 at 13:58
  • @Bozho: Because my employer, while generally cool about stuff, is irrationally paranoid about source code that way--everyone is out to get it! – Stu Thompson Nov 20 '09 at 18:18
  • How do you know the problem is not in your semaphore cache? are you sure you are handing out correct semaphore instances correlated w/ keys? – james Nov 20 '09 at 18:47
  • Yes, I am sure. In my production logs I recorded `semaphore.toString()` liberally. I am 100% it was the same instance ID. In fact, my first debugging suspect was the semaphores. 100% sure. – Stu Thompson Nov 20 '09 at 18:53
  • I tried your code and see no problem of execution. The locking via the Semaphore should have provided the effect of double-check locking, given ConcurrentHashMap provides the same guarantee as using `volatile`, i.e. 2nd thread sees either null or fully completed object. – lcn Dec 04 '15 at 20:33

8 Answers8

11

Some good answers here, but as far as I can tell no-one has actually provided a canonical answer to the question asked: "Is ConcurrentHashMap.get() guaranteed to see a previous ConcurrentHashMap.put() by different thread". Those that have said yes haven't provided a source.

So: yes, it is guaranteed. Source (see the section 'Memory Consistency Properties'):

Actions in a thread prior to placing an object into any concurrent collection happen-before actions subsequent to the access or removal of that element from the collection in another thread.

Cowan
  • 37,227
  • 11
  • 66
  • 65
  • 3
    I don't think that answers the question. I interpret the quote as saying that if you do a put, everything else the thread would have done before the put, will also be visible to the thread doing the get by the time the get succeeds. (The get might still find a null or old value, as long as nothing else the other put thread has done becomes visible to the get thread). IOW, there is no strict time guarantee but there is a relative ordering guarantee. BUT the OPs example should not be affected by the ordering.. – jasonk Aug 28 '13 at 09:40
10

This issue of creating an expensive-to-create object in a cache based on a failure to find it in the cache is known problem. And fortunately this had already been implemented.

You can use MapMaker from Google Collecitons. You just give it a callback that creates your object, and if the client code looks in the map and the map is empty, the callback is called and the result put in the map.

See MapMaker javadocs ...

 ConcurrentMap<Key, Graph> graphs = new MapMaker()
       .concurrencyLevel(32)
       .softKeys()
       .weakValues()
       .expiration(30, TimeUnit.MINUTES)
       .makeComputingMap(
           new Function<Key, Graph>() {
             public Graph apply(Key key) {
               return createExpensiveGraph(key);
             }
           });

BTW, in your original example there is no advantage to using a ConcurrentHashMap, as you are locking each access, why not just use a normal HashMap inside your locked section?

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
David Roussel
  • 5,788
  • 1
  • 30
  • 35
  • The Semaphore is map key specific, not map specific. This is the third time someone's raised the issue...will modify the answer to make this clear. – Stu Thompson Nov 20 '09 at 14:24
  • Could you elaborate on the "failure to find in the cache is a known problem", in his example he states when he accesses the cache it's null. I'd like to know more about this scenario, or point me in a direction to find out more. – reccles Nov 20 '09 at 17:55
  • @reccles - There is a general pattern where you need to compute a value based on some inputs, but the computation is expensive, so you want to re-use previously computed values when you can. The contept is called memoization (see http://en.wikipedia.org/wiki/Memoization). Google's MapMaker (see link above in my comment) is a a simple way of implementing memoziation in java with maps. – David Roussel Nov 26 '09 at 13:08
  • If I use a HashMap and not ConcurrentHashMap then other threads are not guaranteed to see changes. My Map value change over time. Alas, this is all mute now...I opted for a dramatic refactoring (fresh rewrite) and things look good. – Stu Thompson Sep 07 '10 at 06:13
4

If a thread puts a value in concurrent hash map then some other thread that retrieves the value for the map is guaranteed to see the values inserted by the previous thread.

This issue has been clarified in "Java Concurrency in Practice" by Joshua Bloch.

Quoting from the text :-

The thread-safe library collections offer the following safe publication guarantees, even if the javadoc is less than clear on the subject:

  • Placing a key or value in a Hashtable, synchronizedMap or Concurrent-Map safely publishes it to any other thread that retrieves it from the Map (whether directly or via an iterator);
3

One thing to consider, is whether your keys are equal and have identical hashcodes at both times of the "get" call. If they're just Strings then yes, there's not going to be a problem here. But as you haven't given the generic type of the keys, and you have elided "unimportant" details in the pseudocode, I wonder if you're using another class as a key.

In any case, you may want to additionally log the hashcode of the keys used for the gets/puts in threads 1 and 2. If these are different, you have your problem. Also note that key1.equals(key2) must be true; this isn't something you can log definitively, but if the keys aren't final classes it would be worth logging their fully qualified class name, then looking at the equals() method for that class/classes to see if it's possible that the second key could be considered unequal to the first.

And to answer your title - yes, ConcurrentHashMap.get() is guaranteed to see any previous put(), where "previous" means there is a happens-before relationship between the two as specified by the Java Memory Model. (For the ConcurrentHashMap in particular, this is essentially what you'd expect, with the caveat that you may not be able to tell which happens first if both threads execute at "exactly the same time" on different cores. In your case, though, you should definitely see the result of the put() in thread 2).

Andrzej Doyle
  • 102,507
  • 33
  • 189
  • 228
  • I'm sure the keys are good because I use the same exact keys for the Semaphore cache, and I'm 100% sure *that* is working correctly. Thanks, though. – Stu Thompson Nov 20 '09 at 18:02
2

I don't think the problem is in "ConcurrentHashMap" but rather somewhere in your code or about the reasoning about your code. I can't spot the error in the code above (maybe we just don't see the bad part?).

But to answer your question "Is ConcurrentHashMap.get() guaranteed to see a previous ConcurrentHashMap.put() by different thread?" I've hacked together a small test program.

In short: No, ConcurrentHashMap is OK!

If the map is written badly the following program shoukd print "Bad access!" at least from time to time. It throws 100 Threads with 100000 calls to the method you outlined above. But it prints "All ok!".

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Semaphore;
import java.util.concurrent.TimeUnit;

public class Test {
    private final static ConcurrentHashMap<String, Test> map = new ConcurrentHashMap<String, Test>();
    private final static Semaphore lock = new Semaphore(1);
    private static int counter = 0;

    public static void main(String[] args) throws InterruptedException {
        ExecutorService pool = Executors.newFixedThreadPool(100);
        List<Callable<Boolean>> testCalls = new ArrayList<Callable<Boolean>>();
        for (int n = 0; n < 100000; n++)
            testCalls.add(new Callable<Boolean>() {
                @Override
                public Boolean call() throws Exception {
                    doSomething(lock);
                    return true;
                }
            });
        pool.invokeAll(testCalls);
        pool.shutdown();
        pool.awaitTermination(5, TimeUnit.SECONDS);
        System.out.println("All ok!");
    }

    static void doSomething(Semaphore lock) throws InterruptedException {
        boolean haveLock = lock.tryAcquire(3000, TimeUnit.MILLISECONDS);

        if (haveLock) {
            Test foo = map.get("key");
            if (foo == null) {
                foo = new Test();
                map.put("key", new Test());
                if (counter > 0)
                    System.err.println("Bad access!");
                counter++;
            }
            lock.release();
        } else {
            System.err.println("Fail to lock!");
        }
    }
}
Arne Deutsch
  • 14,629
  • 5
  • 53
  • 72
  • How many cores did your test machine have? Thanks for writing this up. I'm going to run it on my production machine this evening. – Stu Thompson Nov 20 '09 at 13:11
  • 2
    Testing multi-threaded code like this does not work. All the iterations are likely to run with exactly the same thread interleavings. – Jørgen Fogh Nov 20 '09 at 14:15
  • What do u mean by "No, ConcurrentHashMap is OK" . Does it mean get is guaranteed to see a previous ConcurrentHashMap.put() by different thread or not? – AKS Aug 30 '13 at 19:34
1

Update: putIfAbsent() is logically correct here, but doesn't avoid the problem of only creating a Foo in the case where the key is not present. It always creates the Foo, even if it doesn't end up putting it in the map. David Roussel's answer is good, assuming you can accept the Google Collections dependency in your app.


Maybe I'm missing something obvious, but why are you guarding the map with a Semaphore? ConcurrentHashMap (CHM) is thread-safe (assuming it's safely published, which it is here). If you're trying to get atomic "put if not already in there", use chm.putIfAbsent(). If you need more complciated invariants where the map contents cannot change, you probably need to use a regular HashMap and synchronize it as usual.

To answer your question more directly: Once your put returns, the value you put in the map is guaranteed to be seen by the next thread that looks for it.

Side note, just a +1 to some other comments about putting the semaphore release in a finally.

if (sem.tryAcquire(3000, TimeUnit.MILLISECONDS)) {
    try {
        // do stuff while holding permit    
    } finally {
        sem.release();
    }
}
overthink
  • 23,985
  • 4
  • 69
  • 69
  • I'm not guarding the map, but a key to the map. The Semaphore is key-specific. `.putIfAbsent()` does not work for me--the object creation is expensive. – Stu Thompson Nov 20 '09 at 14:22
  • @Stu: Not sure I follow. In the example above, the keys in the map are just (immutable) Strings. Even if they aren't in the real code, you're just using the key to look something up (right?), so I don't immediately see why the key itself needs a guard. – overthink Nov 20 '09 at 14:26
  • There is a 1:1 relationship between key and expensive Foo instance. I don't want to create more one Foo instance simultaneously. I *do* want new Foo instances to be created for other keys, simultaneously if need be. I don't want to serialize all Foo instance creation for any given key. – Stu Thompson Nov 20 '09 at 14:30
  • Ok, I see what you're saying. I hereby like David Roussel's answer then :) (I've been working in Scala too much; was thinking putIfAbsent() would only execute the 'new Foo()' if the key was absent, but that's of course not the case) – overthink Nov 20 '09 at 14:39
  • Scala would not instance Foo in that context? Wow...I really, really need to look at Scala some day... – Stu Thompson Nov 20 '09 at 15:46
  • Well, Scala *would* instantiate Foo in this exact context :), but it's common in Scala to encounter (and create) APIs that delay evaluation of parameters (so-called "pass by name" parameters) by wrapping them in a function (i.e. same thing that the Google Collections API is doing above). Scala is an interesting beast. Worth a look. – overthink Nov 20 '09 at 17:31
  • My app actually swaps out the Foo instances every now and then. `.putIfAbsent()` is not what I want. (Actually, I am successfully using `.putIfAbsent()` to permanently cache the Semaphores for access by client threads wanting access to Foo.) – Stu Thompson Nov 20 '09 at 18:00
  • And as I said elsewhere, the semaphore *is* in a finally block in my production code. That, and other error handling code, has been removed for clarity's sake. – Stu Thompson Nov 20 '09 at 18:58
0

Are we seeing an interesting manifestation of the Java Memory Model? Under what conditions are registers flushed to main memory? I think it's guaranteed that if two threads synchronize on the same object then they will see a consistent memory view.

I don't know what Semphore does internally, it almost obviously must do some synchronize, but do we know that?

What happens if you do

synchronize(dedicatedLockObject)

instead of aquiring the semaphore?

djna
  • 54,992
  • 14
  • 74
  • 117
  • `Foo` is so expensive to create (*seconds!*) that I am 99.9999% sure it is OK--I see that thread 2 does not enter it's logic until after thread 1 has released the lock. And `Semaphore` is, after all, in `java.util.concurrent`...it really should be good. – Stu Thompson Nov 20 '09 at 12:59
  • I will try out your suggestion though this evening. A key benefit of Semaphore is that it times out and I can exit. This is happening in a webapp request, and locking up a thread permanently can take down the server. Bad. *Baaaaaad.* – Stu Thompson Nov 20 '09 at 13:01
  • The question is whether those "CPU" registers for thread 1 are flushed, the only guarantees I can find are when synchronized is used. I agree that it would be surprising if this was the problem, but what else do we have? You can use wait/notify to implemented a timeout. – djna Nov 20 '09 at 13:06
  • I will keep this in mind. But I think for now I'm going to pursue the *"metered CHM sub-class"* logging with `System.err` suggested elsewhere. Might as well do the same thing for Semaphore while I'm at it... – Stu Thompson Nov 20 '09 at 13:46
  • The whole point of a semaphore is to synchronise across threads. It may use a simpler technique than a general lock, but it is guaranteed to be updated correctly. – Jørgen Fogh Nov 20 '09 at 14:17
  • @Jørgen: Thanks for the confirmation on that, and hence my usage of it. All indications are that it is OK. – Stu Thompson Nov 20 '09 at 14:23
  • 1
    Even without some kind of synchronized block, the actual content of a ConcurrentHashMap is stored in field declared as volatile, so reads from one thread are still guaranteed (or at least supposed to be guaranteed) to see changes made from a different thread. – jarnbjo Nov 20 '09 at 14:48
  • Just a not from some previous comments, it's not the registers so much as the memory itself. Java makes no guarantees other than that two threads will see the same state at the _start_ of a synchronized block. Threads may not be operating on the same copy of memory but synchronize synchs them up. The long-standing issue with Java has been that the end of a synchronized block doesn't guarantee everything is flushed. However... the volatile keyword has tighter constraints here and is used appropriately within ConcurrentHashMap. Odds are the problem is elsewhere. – PSpeed Nov 20 '09 at 14:50
  • Odds are the problem is *in my code*. It smells like that, now that I've taken in to account everyone's input. Sigh...I know what I'll be doing this weekend! Thanks guys. More later. – Stu Thompson Nov 20 '09 at 18:05
0

Why are you locking a concurrent hash map? By def. its thread safe. If there's a problem, its in your locking code. That's why we have thread safe packages in Java The best way to debug this is with barrier synchronization.

bigtalktheory
  • 51
  • 1
  • 1