76

I have been using Java's ConcurrentMap for a map that can be used from multiple threads. The putIfAbsent is a great method and is much easier to read/write than using standard map operations. I have some code that looks like this:

ConcurrentMap<String, Set<X>> map = new ConcurrentHashMap<String, Set<X>>();

// ...

map.putIfAbsent(name, new HashSet<X>());
map.get(name).add(Y);

Readability wise this is great but it does require creating a new HashSet every time even if it is already in the map. I could write this:

if (!map.containsKey(name)) {
    map.putIfAbsent(name, new HashSet<X>());
}
map.get(name).add(Y);

With this change it loses a bit of readability but does not need to create the HashSet every time. Which is better in this case? I tend to side with the first one since it is more readable. The second would perform better and may be more correct. Maybe there is a better way to do this than either of these.

What is the best practice for using a putIfAbsent in this manner?

Raedwald
  • 46,613
  • 43
  • 151
  • 237
Chris Dail
  • 25,715
  • 9
  • 65
  • 74
  • 3
    In your example, the Value-HashSet would need to be a ConcurrentHashSet too, else this is still not threadsafe. – Markus Kull Sep 20 '10 at 14:17
  • 1
    Tom Hawtin's solution is exactly what youre looking for – John Vint Sep 20 '10 at 14:20
  • 1
    As pointed out by Markus, the value type (the Set in this case) does need to be also thread-safe, as it may be accessed by multiple threads concurrently. – sjlee Sep 21 '10 at 05:47
  • Even if you use Tom Hawtin's response, which is incomplete and I believe to be equivalent to your own suggestion. – petersaints May 09 '13 at 16:50

6 Answers6

109

Concurrency is hard. If you are going to bother with concurrent maps instead of straightforward locking, you might as well go for it. Indeed, don't do lookups more than necessary.

Set<X> set = map.get(name);
if (set == null) {
    final Set<X> value = new HashSet<X>();
    set = map.putIfAbsent(name, value);
    if (set == null) {
        set = value;
    }
}

(Usual stackoverflow disclaimer: Off the top of my head. Not tested. Not compiled. Etc.)

Update: 1.8 has added computeIfAbsent default method to ConcurrentMap (and Map which is kind of interesting because that implementation would be wrong for ConcurrentMap). (And 1.7 added the "diamond operator" <>.)

Set<X> set = map.computeIfAbsent(name, n -> new HashSet<>());

(Note, you are responsible for the thread-safety of any operations of the HashSets contained in the ConcurrentMap.)

Tom Hawtin - tackline
  • 145,806
  • 30
  • 211
  • 305
  • 24
    +1 for "concurrency is hard" and using the returnvalue of putIfAbsent – Markus Kull Sep 20 '10 at 14:19
  • 1
    @Markus - +1 to you as well for pointing out the obvious, yet easy to ignore fact that reusing the returnvalue is a good practice. – Drupad Panchal Jun 09 '11 at 04:35
  • 1
    Good answer. Reminds me of double-checked locking: http://en.wikipedia.org/wiki/Double-checked_locking – Mansoor Siddiqui Oct 04 '12 at 20:41
  • I've been back to this answer more than once. :P – Stu Thompson Nov 01 '12 at 07:50
  • 1
    Wouldn't it still instantiate several instances of `HashSet` in case if several threads simultaneously failed the `if (set == null)` check? – zerkms Apr 02 '13 at 08:37
  • 2
    @zerkms That can happen, and the second `if (set == null)` handles that with the first thread through winning. The situation is unlikely, and the extra allocation time is unlikely to be significant. You could insert a temporary value in order to give a single thread exclusive access, but that's likely to make general performance and reliability worse. – Tom Hawtin - tackline Apr 02 '13 at 11:53
  • @TomHawtin-tackline But wouldn't the suggestion of using contains and get in the first question equivalent? – petersaints May 09 '13 at 16:47
  • 1
    @petersaints: there is no sense in calling `containsKey`, if you are calling `get` afterwards anyway (if `containsKey` returned `true`). Even worse, if `containsKey` returns `true` you still have no guaranty that the subsequent `get` will return a non-`null` value, as a concurrent update can change the situation in-between these two calls (that’s called the check-then-act anti-pattern). In principle, you can do the entire operation using a single `putIfAbsent` operation, but since `get` is known to be substantially faster, trying `get` first has a point. The unused `HashSet` is irrelevant. – Holger Feb 08 '17 at 17:41
  • What is the purpose of the second check `if (set == null) { set = value;}` please? We assign the **value** to **set** but does it present in the **map**? – Bằng Jan 07 '20 at 08:33
  • 1
    @CongBangDO `putIfAbsent` will return `null` (assigned to `set`) if the map now uses `value`. If the `name` key was already present, then `putIfAbsent` will return the existing retained value. – Tom Hawtin - tackline Jan 07 '20 at 14:20
16

Tom's answer is correct as far as API usage goes for ConcurrentMap. An alternative that avoids using putIfAbsent is to use the computing map from the GoogleCollections/Guava MapMaker which auto-populates the values with a supplied function and handles all the thread-safety for you. It actually only creates one value per key and if the create function is expensive, other threads asking getting the same key will block until the value becomes available.

Edit from Guava 11, MapMaker is deprecated and being replaced with the Cache/LocalCache/CacheBuilder stuff. This is a little more complicated in its usage but basically isomorphic.

Jed Wesley-Smith
  • 4,686
  • 18
  • 20
  • I just tried this and it is a great solution. You get all the benefits of ConcurrentMap without having to worry about the putIfAbsent idioms, which are easy to mess up. – Matt Friedman Jul 18 '12 at 17:38
7

You can use MutableMap.getIfAbsentPut(K, Function0<? extends V>) from Eclipse Collections (formerly GS Collections).

The advantage over calling get(), doing a null check, and then calling putIfAbsent() is that we'll only compute the key's hashCode once, and find the right spot in the hashtable once. In ConcurrentMaps like org.eclipse.collections.impl.map.mutable.ConcurrentHashMap, the implementation of getIfAbsentPut() is also thread-safe and atomic.

import org.eclipse.collections.impl.map.mutable.ConcurrentHashMap;
...
ConcurrentHashMap<String, MyObject> map = new ConcurrentHashMap<>();
map.getIfAbsentPut("key", () -> someExpensiveComputation());

The implementation of org.eclipse.collections.impl.map.mutable.ConcurrentHashMap is truly non-blocking. While every effort is made not to call the factory function unnecessarily, there's still a chance it will be called more than once during contention.

This fact sets it apart from Java 8's ConcurrentHashMap.computeIfAbsent(K, Function<? super K,? extends V>). The Javadoc for this method states:

The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple...

Note: I am a committer for Eclipse Collections.

Donald Raab
  • 6,458
  • 2
  • 36
  • 44
Craig P. Motlin
  • 26,452
  • 17
  • 99
  • 126
3

By keeping a pre-initialized value for each thread you can improve on the accepted answer:

Set<X> initial = new HashSet<X>();
...
Set<X> set = map.putIfAbsent(name, initial);
if (set == null) {
    set = initial;
    initial = new HashSet<X>();
}
set.add(Y);

I recently used this with AtomicInteger map values rather than Set.

karmakaze
  • 34,689
  • 1
  • 30
  • 32
  • As noted in the update to the accepted answer, Java 1.8 adds computeIfAbsent which achieves the same result and is far simpler. – karmakaze Dec 13 '16 at 04:34
  • The context this code needs to be used in is going to be tricky. It will cause an "interesting" to find bug down the road, if not straight off. I'm not convinced that it's a performance win either. (Also you need to lock access to the `HashSet`.) – Tom Hawtin - tackline Feb 08 '17 at 18:49
3

In 5+ years, I can't believe no one has mentioned or posted a solution that uses ThreadLocal to solve this problem; and several of the solutions on this page are not threadsafe and are just sloppy.

Using ThreadLocals for this specific problem isn't only considered best practices for concurrency, but for minimizing garbage/object creation during thread contention. Also, it's incredibly clean code.

For example:

private final ThreadLocal<HashSet<X>> 
  threadCache = new ThreadLocal<HashSet<X>>() {
      @Override
      protected
      HashSet<X> initialValue() {
          return new HashSet<X>();
      }
  };


private final ConcurrentMap<String, Set<X>> 
  map = new ConcurrentHashMap<String, Set<X>>();

And the actual logic...

// minimize object creation during thread contention
final Set<X> cached = threadCache.get();

Set<X> data = map.putIfAbsent("foo", cached);
if (data == null) {
    // reset the cached value in the ThreadLocal
    listCache.set(new HashSet<X>());
    data = cached;
}

// make sure that the access to the set is thread safe
synchronized(data) {
    data.add(object);
}
Nathan
  • 937
  • 9
  • 15
  • This is 'thread-shared'. The ThreadLocal is to prevent unnecessary object creation (Map, and consequently Set creation is not cheap) during the "putIfAbsent()" call. The set is correctly and safely published to all threads. – Nathan Feb 03 '16 at 17:22
  • And when combined with `putIfAbsent()`, only one will win. Thus `data` will always be the same across all threads. – Nathan Feb 04 '16 at 18:49
  • I don't think the performance of this will be great. You have to go through a `ThreadLocal` in addition every time you just needed a `ConcurrentMap.get`. (And you access of the `HashSet` **is not threadsafe**.) – Tom Hawtin - tackline Feb 08 '17 at 18:57
  • 1
    It hits the lock on the concurrent map, so according to JSL Chapter 17.4 and the general behavior of locks, (edit) it is visible, **but not threadsafe**. Thank you for pointing that out, I've updated the answer. You are right about the performance ... however if one is wanting performance, one would not use a concurrent map nor the (default) thread local, but instead very different data structures; which is way, way beyond the scope of this question. – Nathan Feb 09 '17 at 22:36
  • I should also point out that the performance hit on ThreadLocal is way less than the performance hit on creating and instantiating new objects, specifically a Set... – Nathan Feb 09 '17 at 22:42
0

My generic approximation:

public class ConcurrentHashMapWithInit<K, V> extends ConcurrentHashMap<K, V> {
  private static final long serialVersionUID = 42L;

  public V initIfAbsent(final K key) {
    V value = get(key);
    if (value == null) {
      value = initialValue();
      final V x = putIfAbsent(key, value);
      value = (x != null) ? x : value;
    }
    return value;
  }

  protected V initialValue() {
    return null;
  }
}

And as example of use:

public static void main(final String[] args) throws Throwable {
  ConcurrentHashMapWithInit<String, HashSet<String>> map = 
        new ConcurrentHashMapWithInit<String, HashSet<String>>() {
    private static final long serialVersionUID = 42L;

    @Override
    protected HashSet<String> initialValue() {
      return new HashSet<String>();
    }
  };
  map.initIfAbsent("s1").add("chao");
  map.initIfAbsent("s2").add("bye");
  System.out.println(map.toString());
}
ggrandes
  • 2,067
  • 22
  • 16