6

I'm trying to implement a thread-safe Map cache, and I want the cached Strings to be lazily initialized. Here's my first pass at an implementation:

public class ExampleClass {

    private static final Map<String, String> CACHED_STRINGS = new HashMap<String, String>();

    public String getText(String key) {

        String string = CACHED_STRINGS.get(key);

        if (string == null) {

            synchronized (CACHED_STRINGS) {

                string = CACHED_STRINGS.get(key);

                if (string == null) {
                    string = createString();
                    CACHED_STRINGS.put(key, string);
                }
            }
        }

        return string;
    }
}

After writing this code, Netbeans warned me about "double-checked locking," so I started researching it. I found The "Double-Checked Locking is Broken" Declaration and read it, but I'm unsure if my implementation falls prey to the issues it mentioned. It seems like all the issues mentioned in the article are related to object instantiation with the new operator within the synchronized block. I'm not using the new operator, and Strings are immutable, so I'm not sure that if the article is relevant to this situation or not. Is this a thread-safe way to cache strings in a HashMap? Does the thread-safety depend on what action is taken in the createString() method?

stiemannkj1
  • 4,418
  • 3
  • 25
  • 45
  • 7
    Why not just use `ConcurrentHashMap`? – Andy Turner Feb 08 '16 at 14:20
  • 1
    And to answer your title: yes, provided all other accesses to the map are synchronized too. – Andy Turner Feb 08 '16 at 14:21
  • `final` has nothing to do with what you are trying to do! –  Feb 08 '16 at 14:25
  • Hmm, I was told `HashMap` would perform better. It seems [self evident](http://stackoverflow.com/questions/1378310/performance-concurrenthashmap-vs-hashmap), but perhaps I'm prematurely optimizing. I would still like to know whether this approach will work even if it isn't the best or most obvious. – stiemannkj1 Feb 08 '16 at 14:26
  • @JarrodRoberson, so whether it's `final` or not has no effect on the thread-safety in this situation? – stiemannkj1 Feb 08 '16 at 14:27
  • Final: http://stackoverflow.com/questions/15655012/how-final-keyword-works – Austin Feb 08 '16 at 14:28
  • There is actually the *volatile solution*, see the section *Fixing Double-Checked Locking using Volatile* in the link you provided. Unless you're using JDK4 or earlier. – sp00m Feb 08 '16 at 14:29
  • I know, about `volatile`, but I was under the (seemingly mistaken) impression that double-checked locking being broken had to do with using the `new` operator within the `synchronized` block since all the examples from the article use it. – stiemannkj1 Feb 08 '16 at 14:31
  • see [The final word on final](http://www.vertigrated.com/blog/2011/05/the-final-word-on-final/) for an indepth explination of what `final` is for. `final` is critical for using a reference as a lock target, but it does not have anything specific to do with `concurrency`. –  Feb 08 '16 at 14:37
  • 1
    In this case, it looks like the JVM cannot reorder the assignment and the placing the string in the map. The question comes down to, have you exposed an un-intitialized String via the map? If it does, I doubt the volatile fix would fix the issue because only the map is volatile. So, is there any guarantee that the string will finish being initialized by the time it is in the map? – matt Feb 08 '16 at 14:50
  • @matt, can a `String` be partially initialized? It seems like [initialization of a `String` should be atomic](http://stackoverflow.com/questions/4756536/what-operations-in-java-are-considered-atomic). – stiemannkj1 Feb 08 '16 at 15:02
  • I would say yes, there is no synchronization when creating a string. Which means there is no guarantee that the string will be finished being created. Only a few specific actions are atomic without using synchronized, volatile. eg. The assignment of a 32bit int or float as per your example. [string](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/String.java) – matt Feb 08 '16 at 15:09
  • @JarrodRoberson -- `final` has something to do with concurrency since java 5 ... they added another meaning to `final` (which is rather unrelated to the previous meanings). see my explanation http://stackoverflow.com/questions/35167777/are-immutable-objects-immune-to-improper-publication/35169705#35169705 – ZhongYu Feb 08 '16 at 19:49
  • double-checked locking does work with immutables; but your real problem is to read/write a HashMap concurrently, which is highly dangerous. the danger is not just theoretical. google `hashmap high cpu usage` -- and you'll see, it could completely freeze the application. – ZhongYu Feb 08 '16 at 19:51

4 Answers4

5

No it's not correct because the first access is done out side of a sync block.

It's somewhat down to how get and put might be implemented. You must bare in mind that they are not atomic operations.

For example, what if they were implemented like this:

public T get(string key){
    Entry e = findEntry(key);
    return e.value;
}

public void put(string key, string value){
    Entry e = addNewEntry(key);
    //danger for get while in-between these lines
    e.value = value;
}

private Entry addNewEntry(key){
   Entry entry = new Entry(key, ""); //a new entry starts with empty string not null!
   addToBuckets(entry); //now it's findable by get
   return entry; 
}

Now the get might not return null when the put operation is still in progress, and the whole getText method could return the wrong value.

The example is a bit convoluted, but you can see that correct behaviour of your code relies on the inner workings of the map class. That's not good.

And while you can look that code up, you cannot account for compiler, JIT and processor optimisations and inlining which effectively can change the order of operations just like the wacky but correct way I chose to write that map implementation.

weston
  • 54,145
  • 21
  • 145
  • 203
3

Consider use of a concurrent hashmap and the method Map.computeIfAbsent() which takes a function to call to compute a default value if key is absent from the map.

Map<String, String> cache = new ConcurrentHashMap<>(  );
cache.computeIfAbsent( "key", key -> "ComputedDefaultValue" );

Javadoc: If the specified key is not already associated with a value, attempts to compute its value using the given mapping function and enters it into this map unless null. The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.

Prim
  • 2,880
  • 2
  • 15
  • 29
0

Non-trivial problem domains:

Concurrency is easy to do and hard to do correctly.

Caching is easy to do and hard to do correctly.

Both are right up there with Encryption in the category of hard to get right without an intimate understanding of the problem domain and its many subtle side effects and behaviors.

Combine them and you get a problem an order of magnitude harder than either one.

This is a non-trivial problem that your naive implementation will not solve in a bug free manner. The HashMap you are using is not going to threadsafe if any accesses are not checked and serialized, it will not be performant and will cause lots of contention that will cause lot of blocking and latency depending on the use.

The proper way to implement a lazy loading cache is to use something like Guava Cache with a Cache Loader it takes care of all the concurrency and cache race conditions for you transparently. A cursory glance through the source code shows how they do it.

Community
  • 1
  • 1
  • 3
    Can you give an example of why this implementation produces bugs? – Austin Feb 08 '16 at 14:30
  • Thanks, +1 this may be asking too much, but could you possibly give a simple example of a situation where this would break? Something like the examples in the [atricle](http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html) I mentioned. – stiemannkj1 Feb 08 '16 at 14:43
  • Read the source code for Guava Cache and you will see why your solution is not viable, Google would not have written what they did unless it was a hard problem to solve. @Nathan explains why your solution is broken. –  Feb 08 '16 at 14:45
0

No, and ConcurrentHashMap would not help.

Recap: the double check idiom is typically about assigning a new instance to a variable/field; it is broken because the compiler can reorder instructions, meaning the field can be assigned with a partially constructed object.

For your setup, you have a distinct issue: the map.get() is not safe from the put() which may be occurring thus possibly rehashing the table. Using a Concurrent hash map fixes ONLY that but not the risk of a false positive (that you think the map has no entry but it is actually being made). The issue is not so much a partially constructed object but the duplication of work.

As for the avoidable guava cacheloader: this is just a lazy-init callback that you give to the map so it can create the object if missing. This is essentially the same as putting all the 'if null' code inside the lock, which is certainly NOT going to be faster than good old direct synchronization. (The only times it makes sense to use a cacheloader is for pluggin-in a factory of such missing objects while you are passing the map to classes who don't know how to make missing objects and don't want to be told how).

user2023577
  • 1,752
  • 1
  • 12
  • 23