Why is the local variable ks declared in the HashMap.keySet()?

Question

I looked at the source code java.util.HashMap and saw the following code:

public Set<K> keySet() {
    Set<K> ks;
    return (ks = keySet) == null ? (keySet = new KeySet()) : ks;
}

(Windows, java version "1.8.0_111")

On my MacBook it looks like this:

public Set<K> keySet() {
    Set<K> ks = keySet;
    if (ks == null) {
        ks = new KeySet();
        keySet = ks;
    }
    return ks;
}

(MacOs X Sierra, java version "1.8.0_121")

Why do both variants declare a local variable ks? Why is it not written like this:

public Set<K> keySet() {
    if (keySet == null) {
        keySet = new KeySet();
    }
    return keySet;
}

or

public Set<K> keySet() {
    return keySet == null ? (keySet = new KeySet()) : keySet;
}

@MarounMaroun I dont see it being volatile. The only modifier I see is `transient` — Cargeh, Sep 04 '17 at 10:03

score 12 · Answer 1 · answered Sep 04 '17 at 10:02

JavaDoc has the answer:

/**
 * Since there is no synchronization performed while accessing these fields,
 * it is expected that java.util.Map view classes using these fields have
 * no non-final fields (or any fields at all except for outer-this). Adhering
 * to this rule would make the races on these fields benign.
 *
 * It is also imperative that implementations read the field only once,
 * as in:
 *
 * public Set<K> keySet() {
 *   Set<K> ks = keySet;  // single racy read
 *   if (ks == null) {
 *     ks = new KeySet();
 *     keySet = ks;
 *   }
 *   return ks;
 * }
 *}
 */
transient Set<K> keySet;

you're right about the comment, I've tried to explain it a bit more — Eugene, Sep 05 '17 at 07:42

score 4 · Accepted Answer · answered Sep 05 '17 at 07:39

As far as I can tell this is an optimization that is pretty neat.

This was previously written like this:

if (keySet == null) { // volatile read
         keySet = new AbstractSet<K>() { // volatile write
          ....

return keySet; // volatile read

These operations can not be re-ordered because there are memory barriers that are inserted here. So it would look like this:

 [StoreLoad]
 // volatile read
 [LoadLoad]
 [LoadStore]

 [StoreStore]
 [LoadStore]
 // volatile write
 [StoreLoad]

 [StoreLoad] // there's probably just one barrier here instead of two
 // volatile read
 [LoadLoad]
 [LoadStore]

There are lots of barriers here and the most expensive would be the StoreLoad that is emitted on x86.

Suppose we drop the volatile here. Since there are no barriers inserted these operations can be re-ordered in any way they please and there are two racy reads here of the keySet variable.

We can have a single racy read and store the variable into a local field(since they are local, they are thread safe - no one can change the reference that is locally declared), the only problem as far as I can see is that multiple threads might see a null reference at the same time and initialize it with an empty KeySet and potentially doing too much work; but that is most probably cheaper than the barriers.

On the other hand if some threads sees a non-null reference, it will 100% see a fully initialized object and that is the comment about final fields. If all objects are final the JMM guarantees a "freeze" action after the constructor; or in simpler words (IMO) if all fields are final and initialized in the constructor there are two barriers inserted after it: LoadStore and LoadLoad; thus achieving the same effect.

Why is the local variable ks declared in the HashMap.keySet()?

2 Answers2

Linked

Related