15

Some Guava internal types, like AbstractMultiset, have a pattern like this:

private transient Set<E> elementSet;

@Override
public Set<E> elementSet() {
  Set<E> result = elementSet;
  if (result == null) {
    elementSet = result = createElementSet();
  }
  return result;
}

Set<E> createElementSet() {
  return new ElementSet();
}

The idea is to delay creating the collection views (elementSet(), entrySet()) until they're actually needed. There's no locking around the process because if two threads call elementSet() at the same time, it's okay to return two different values. There will be a race to write the elementSet field, but since writes to reference fields are always atomic in Java, it doesn't matter who wins the race.

However, I worry about what the Java memory model says about inlining here. If createElementSet() and ElementSet's constructor both get inlined, it seems like we could get something like this:

@Override
public Set<E> elementSet() {
  Set<E> result = elementSet;
  if (result == null) {
    elementSet = result = (allocate an ElementSet);
    (run ElementSet's constructor);
  }
  return result;
}

This would allow another thread to observe a non-null, but incompletely initialized value for elementSet. Is there a reason that can't happen? From my reading of JLS 17.5, it seems like other threads are only guaranteed to see correct values for final fields in elementSet, but since ElementSet ultimately derives from AbstractSet, I don't think there's a guarantee that all its fields are final.

Tavian Barnes
  • 12,477
  • 4
  • 45
  • 118
  • how should inlining move the constructor after the assignment? – zapl Apr 03 '14 at 15:53
  • 2
    @zapl - IIRC, this is allowed; basically: allocate the memory for the new instance (and assign the reference), then call the constructor. This [article](http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html) explains it. – Dirk Apr 03 '14 at 15:55
  • Shouldn't `transient` be `volatile`? – OldCurmudgeon Apr 03 '14 at 16:04
  • @zapl Example 17.5-1 in the link I posted seems to contradict that, no? – Tavian Barnes Apr 03 '14 at 16:04
  • @OldCurmudgeon The code is copy-pasted, I'm sure the `transient` is intended. Whether it should be `volatile` too is a different question. – Tavian Barnes Apr 03 '14 at 16:05
  • @TavianBarnes no, that would apply if `elementSet` was a field of the allocated new object. If you were to leak a `this` reference of the new object, you could see that field unitialitzed. But here you have 2 different objects. The one that is being initialized is fully initialized before you can assign it since you are outside of it. There will be a race on the `elementSet` field but that's "acceptable". http://stackoverflow.com/q/18948990/995891 in the comments is some related discussion about that pattern – zapl Apr 03 '14 at 16:16
  • @zapl I'm not convinced. In example 17.5-1, the field being updated is static. It seems to me that it could just as well be a (static or instance) field in a different object. IOW, I think all of [these](https://gist.github.com/tavianator/9957738) have the same semantics. Also I know about the String hashCode() caching trick, but that's definitely safe because `int` is a primitive type. – Tavian Barnes Apr 03 '14 at 16:27
  • http://jeremymanson.blogspot.co.uk/2008/12/benign-data-races-in-java.html "32 bit or object references" (blog post by JMM author) (also note that it is returning `result` and not `elementSet`) – zapl Apr 03 '14 at 16:35
  • @zapl "Edited to add: I just noticed that this was misleading. An object reference will point to the right object, but the contents of the object aren't guaranteed to be correct unless the object is immutable." – Tavian Barnes Apr 03 '14 at 17:19
  • Oh.. I was much wrong :) And I guess I've understood why `final` fields are threadsafe (because the reordering of constructor execution/reference publishing is still permissible under the new JMM even for properly constructed objects that don't `this` leak, `final` guarantees that the field is visible in that case..) Btw: [JCIP](http://jcip.net/) chapter 16.2. explains it nicely. – zapl Apr 03 '14 at 18:26

1 Answers1

8

I'm not 100% clear on this (I'm sure someone else on our team could answer this better). That said, a couple thoughts:

  1. I don't think we claim anywhere that this is (guaranteed to be) thread-safe. Non-thread-safe collections such as HashMultiset extend AbstractMultiset. That said, ConcurrentHashMultiset also extends AbstractMultiset and uses its implementation of elementSet(), so presumably it must in fact be possible for it to be thread-safe.
  2. I believe the thread safety of this method is dependent on the implementation of createElementSet(). From what I can tell, if the Set created by createElementSet() is immutable (in that the fields that are assigned when it is constructed are final), it should be thread-safe. This appears to be true in the case of ConcurrentHashMultiset at least.

Edit: I asked Jeremy Manson about this, and he said: "Your take on it seems fine to me. It isn't thread safe. If the object being constructed has all of the final fields in the right places, you should be fine, but I wouldn't rely on that by accident (note that many implementations are effectively immutable instead genuinely immutable)."

Note: For thread-safe collections like ConcurrentHashMultiset which use this pattern, objects created are intentionally and genuinely immutable (though if AbstractSet were to change, that could change, as Chris noted in the comments).

ColinD
  • 108,630
  • 30
  • 201
  • 202
  • 1
    RE: 1, I think it ought to be safe for two threads to call `elementSet()` simultaneously without synchronization, even on something like a `HashMultiset`, as long as they don't mutate the result. – Tavian Barnes Apr 03 '14 at 17:29
  • RE: 2, that seems true, you certainly wouldn't want `createElementSet()` to do any work after constructing the object. I agree that it should work if the element set is immutable (only `final` fields), but I don't think the default implementation guarantees that (because `ElementSet` ultimately derives from the JDK class `AbstractSet`). – Tavian Barnes Apr 03 '14 at 17:30
  • 1
    I think that everything in the question, answer, and responses is correct. Fortunately, I would be surprised if `AbstractSet` were ever changed to initialize non-`final` fields in its constructor. But stranger things have happened. Maybe `AbstractSet` would be defined differently under an alternative environment. If so, things could break. So, in principle, we ought to fix this. In practice, it's a question of priorities. I wonder how much code we'd have to rewrite. Anyway, you could file a bug: https://code.google.com/p/guava-libraries/issues/entry – Chris Povirk Apr 03 '14 at 17:45
  • (In all I said, I'm implicitly agreeing with your thinking that `HashMultiset` and friends should be safe to _read_ concurrently from multiple threads.) – Chris Povirk Apr 03 '14 at 17:49
  • @ChrisPovirk I may do that. There's other uses of this pattern (`AbstractMultimap` has many more), so if I find one that could fail in practice I'll make it higher priority. By the way would marking the field `volatile` fix it? – Tavian Barnes Apr 03 '14 at 18:31
  • @TavianBarnes: I'd bet it would ([happens-before](http://en.wikipedia.org/wiki/Volatile_variable#In_Java)), but volatile isn't free. I guess it's not needed as all immutable classes create immutable key/value/entry sets/collections and `ConcurrentHashMultiset` does it right. In general, there are hardly any fields and if there's one, it's final (the compiler-generated reference to the enclosing class is final, too). – maaartinus Apr 03 '14 at 19:27
  • @ChrisPovirk Thanks. FWIW the JDK's AbstractMap uses the same pattern, but with `volatile` added: http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/9b8c96f96a0f/src/share/classes/java/util/AbstractMap.java#l294 – Tavian Barnes Apr 03 '14 at 20:22
  • 1
    Huh, they changed it for JDK 9: http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/ce72c7641f38 – Tavian Barnes Jun 21 '16 at 20:17
  • It may be technically thread-safe, but making it non-`volatile` means it could be constructed multiple times. Maybe it's worth the tradeoff for simple view classes, but there are some classes that lazily populate a whole new data structure, like `ImmutableMap.inverse()`. Wouldn't it make sense to use DCL in those cases? – shmosel Apr 17 '17 at 00:46