12

I have a piece of code that can be executed by multiple threads that needs to perform an I/O-bound operation in order to initialize a shared resource that is stored in a ConcurrentMap. I need to make this code thread safe and avoid unnecessary calls to initialize the shared resource. Here's the buggy code:

    private ConcurrentMap<String, Resource> map;

    // .....

    String key = "somekey";
    Resource resource;
    if (map.containsKey(key)) {
        resource = map.get(key);
    } else {
        resource = getResource(key); // I/O-bound, expensive operation
        map.put(key, resource);
    }

With the above code, multiple threads may check the ConcurrentMap and see that the resource isn't there, and all attempt to call getResource() which is expensive. In order to ensure only a single initialization of the shared resource and to make the code efficient once the resource has been initialized, I want to do something like this:

    String key = "somekey";
    Resource resource;
    if (!map.containsKey(key)) {
        synchronized (map) {
            if (!map.containsKey(key)) {
                resource = getResource(key);
                map.put(key, resource);
            }
        }
    }

Is this a safe version of double checked locking? It seems to me that since the checks are called on ConcurrentMap, it behaves like a shared resource that is declared to be volatile and thus prevents any of the "partial initialization" problems that may happen.

pmc255
  • 1,499
  • 2
  • 19
  • 31
  • 1
    If you look at the "Related" section over to the right of this page and down a bit, you'll see a lot of good information. In particular, the accepted answer in this question: http://stackoverflow.com/questions/157198/double-checked-locking-article – Jeremy Aug 09 '11 at 21:31
  • 1
    This whole argument is questionable. Have you actually timed how long it takes to enter a synchronized block? (which is what you're trying to avoid) In my tests I was getting 0 ms which tells me.. it does not matter – che javara Aug 17 '12 at 14:22
  • It isn't the time to enter the synchronized block that pmc255 is trying to avoid, but the time to load the resource via getResource(key), which pmc255 said was an "I/O-bound, expensive operation". – David Conrad Sep 26 '13 at 22:02

6 Answers6

4

If you can use external libraries, take a look at Guava's MapMaker.makeComputingMap(). It's tailor-made for what you're trying to do.

dnault
  • 8,340
  • 1
  • 34
  • 53
  • I looked into the impl. It's ungodly complicated. I don't believe anybody can analyze its concurrency behavior. – irreputable Aug 09 '11 at 22:50
  • 1
    The computing map from Guava (Google collections) ensures the (expensive) computation is done only once. Brian Goetz also has this pattern in his book under the name "Memoizer". – sjlee Aug 10 '11 at 00:23
  • Well, I'm constructing the ConcurrentMap using MapMaker. My question is whether using a ConcurrentMap (with its contract as defined by its interface) is a sufficient workaround for the common double checked locking problem. – pmc255 Aug 10 '11 at 01:31
  • 2
    This functionality has now been moved out of `MapMaker` to [CacheBuilder](http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/cache/CacheBuilder.html) – MikeFHay Dec 18 '13 at 15:16
3

yes it' safe.

If map.containsKey(key) is true, according to doc, map.put(key, resource) happens before it. Therefore getResource(key) happens before resource = map.get(key), everything is safe and sound.

irreputable
  • 44,725
  • 9
  • 65
  • 93
  • Can you link to the doc that says that? i can't seem to find that bit EDIT: Is [this the doc](http://gee.cs.oswego.edu/dl/jsr166/dist/docs/java/util/concurrent/ConcurrentHashMap.html) you were talking about – jonatzin Apr 28 '14 at 10:24
2

Why not use the putIfAbsent() method on ConcurrentMap?

if(!map.containsKey(key)){
  map.putIfAbsent(key, getResource(key));
}

Conceivably you might call getResource() more than once, but it won't happen a bunch of times. Simpler code is less likely to bite you.

rfeak
  • 8,124
  • 29
  • 28
  • 8
    duplicate getResource() is exactly what he wants to avoid – irreputable Aug 09 '11 at 21:46
  • 1
    irreputable is right - the getResource() call is what I want to only call once. – pmc255 Aug 10 '11 at 01:30
  • 1
    You should still use putIfAbsent, but you can put it inside the sync block. – rfeak Aug 11 '11 at 20:13
  • 2
    @rfeak Then you have a sync block around everything and you have to synchronize every time you get something from the map. The whole point of double-checked locking is to avoid that extra synchronization. – Rag Oct 23 '13 at 16:04
1

In general, double-checked locking is safe if the variable you're synchronizing on is marked volatile. But you're better off synchronizing the entire function:


public synchronized Resource getResource(String key) {
  Resource resource = map.get(key);
  if (resource == null) {
    resource = expensiveGetResourceOperation(key);    
    map.put(key, resource);
  }
  return resource;
}

The performance hit will be tiny, and you'll be certain that there will be no sync problems.

Edit:

This is actually faster than the alternatives, because you won't have to do two calls to the map in most cases. The only extra operation is the null check, and the cost of that is close to zero.

Second edit:

Also, you don't have to use ConcurrentMap. A regular HashMap will do it. Faster still.

ccleve
  • 15,239
  • 27
  • 91
  • 157
  • 2
    This method will likely be called many many times over and over again; synchronizing the entire method seems expensive, which is what I wanted to avoid in the first place. Additionally, the call to a ConcurrentMap.containsKey should be slightly more efficient since the locking for a ConcurrentMap is more granular than an object-level lock that synchronizes the entire containsKey operation. That is, simultaneous calls to containsKey are possible, whereas the synchronized getResource() will ALWAYS be accessible to one thread at a time, even after the resource has been initialized. – pmc255 Aug 10 '11 at 17:04
  • 1
    In fact, your example is exactly like one of the poorly performing examples of "memoization" as listed in section 5.6 of Java Concurrency In Practice. – pmc255 Aug 10 '11 at 17:14
  • I would agree with this answer. When I timed the performance hit of entering a synchronized block AND the hashmap operations it was still < 1ms. You need to have a justification for sub millisecond optimizations. – che javara Aug 17 '12 at 14:41
  • 2
    The speed of this operation depends on the amount of contention. If there is little contention, it is far and away the fastest method. If there is a lot of contention then it might be slower. As always, don't speculate, write a benchmark. – ccleve Mar 07 '15 at 16:07
0

The verdict is in. I timed 3 different solutions in nanosecond accuracy, since after all the initial question was about performance:

Fully synching the function on a regular HashMap:

synchronized (map) {

   Object result = map.get(key);
   if (result == null) {
      result = new Object();
      map.put(key, result);
   }                
   return result;
}

first invocation: 15,000 nanoseconds, subsequent invocations: 700 nanoseconds

Using the double check lock with a ConcurrentHashMap:

if (!map.containsKey(key)) {
   synchronized (map) {
      if (!map.containsKey(key)) {
         map.put(key, new Object());
      }
   }
} 
return map.get(key);

first invocation: 15,000 nanoseconds, subsequent invocations: 1500 nanoseconds

A different flavor of double checked ConcurrentHashMap:

Object result = map.get(key);
if (result == null) {
   synchronized (map) {
      if (!map.containsKey(key)) {
         result = new Object();
         map.put(key, result);
      } else {
         result = map.get(key);
      }
   }
} 

return result;

first invocation: 15,000 nanoseconds, subsequent invocations: 1000 nanoseconds

You can see that the biggest cost was on the first invocation, but was similar for all 3. Subsequent invocations were the fastest on the regular HashMap with method sync like user237815 suggested but only by 300 NANO seocnds. And after all we are talking about NANO seconds here which means a BILLIONTH of a second.

che javara
  • 748
  • 1
  • 8
  • 18
  • Now replace `new Object()` with something that reads and parses a few megabytes of XML from the network, and you'll have something like what pmc255 was actually asking about. You have misunderstood the question. – David Conrad Sep 26 '13 at 22:08
  • Nope, I didn't misunderstand the question. It was about checking that a map does not contain an object before constructing it and putting it in the map and making sure that its thread safe. Doesn't matter what new Object() does, it will be called only if the object does not exist. – che javara Sep 30 '13 at 16:53
0

No need for that - ConcurrentMap supports this as with its special atomic putIfAbsent method.

Don't reinvent the wheel: Always use the API where possible.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • 6
    Before I call putIfAbsent, I need to fetch the value that I want to put in the map; that fetch operation is expensive. Using putIfAbsent doesn't solve that. – pmc255 Aug 10 '11 at 02:20
  • that leave you with redundant expensive initialization of the thing to be inserted if you lose the race. – bmargulies Feb 20 '12 at 13:08
  • @bmargulies that is true, but it only happens rarely during race conditions for the same key, so it shouldn't hurt too much. You can put extra guards around it, but I would wait to see what happens at runtime and only "fix" it if it *needs* fixing – Bohemian Feb 20 '12 at 13:34
  • 2
    In my case, the expensive thing has side effects and absolutely cannot be run extra times. The Guava ComputedMap looks appropriate. – bmargulies Feb 20 '12 at 18:17