Why not use values variable directly, the vs variable is not necessary

Question

Here is from HashMap：

transient Collection<V> values;

public Collection<V> values() {
    Collection<V> vs = values;
    if (vs == null) {
        vs = new Values();
        values = vs;
    }
    return vs;
}

I wonder why not use member variable values directly? Why create the local variable named vs?

How is that better than:

transient Collection<V> values;

public Collection<V> values() {
    if (values == null) {
        values = new Values();
    }
    return values;
}

The second version has a race condition. `values` could be non-null, then another thread makes it null, which would cause this method to return a null. — matt, Dec 12 '19 at 09:10
@matt `HashMap` is not meant to be thread safe. I don't think either is thread safe really. — Amongalen, Dec 12 '19 at 09:12
@matt it has nothing to do with race condition here, don't spread false assumption — Mạnh Quyết Nguyễn, Dec 12 '19 at 09:24
Related, possibly duplicate: [assign instance field to local variable](https://stackoverflow.com/questions/37957376/assign-instance-field-to-local-variable) — Mark Rotteveel, Dec 12 '19 at 09:29
Guys, @matt didn't talk about "thread-safety". He talked about "race conditions"...and he is right to some extent. There is a race condition there that can lead to inconsistent behaviour. HashMap may not be thread-safe but as ``values()`` changes its internal state it should at least behave consistently. — Filippo Possenti, Dec 12 '19 at 09:29
how the race conditions solved when both thread initialize it own empty `Values()`? @FilippoPossenti — Mạnh Quyết Nguyễn, Dec 12 '19 at 09:31
Does this answer your question? [In ArrayBlockingQueue, why copy final member field into local final variable?](https://stackoverflow.com/questions/2785964/in-arrayblockingqueue-why-copy-final-member-field-into-local-final-variable) — Mạnh Quyết Nguyễn, Dec 12 '19 at 09:31
@MạnhQuyếtNguyễn it isn't a false statement. It's probably not be 'the reason' for the local access. — matt, Dec 12 '19 at 09:31
@MạnhQuyếtNguyễn - To be fair, that particular quote may not apply to this case, where using the local actually produces larger bytecode, not smaller. More efficient, but not smaller. — T.J. Crowder, Dec 12 '19 at 09:37
@MạnhQuyếtNguyễn The example is in regards to a final field, which is not the case here. Although the second answer addresses the fact a final field could change. — matt, Dec 12 '19 at 09:38
@MạnhQuyếtNguyễn the point is that the second implementation will re-read the external field, thus potentially returning something different than the instance created internally. The first implementation on the other hand, reads the external field only once and then works on the internal variable, thus guaranteeing that if you get inside the "if" condition the returned value will always be that of the newly created instance. So the second implementation may behave inconsistently whereas the first one will always behave consistently. — Filippo Possenti, Dec 12 '19 at 09:56
@FilippoPossenti Don't talk about fancy thing please. Please explain what `consistency` that implementation tries to achieve. Both thread got it own empty `new Values()` is consistency? I think not. Hence it does not related to fancy consistency anyway — Mạnh Quyết Nguyễn, Dec 12 '19 at 10:05
@MạnhQuyếtNguyễn The first version gives you consistent results because it will return a `new Values();` created locally or the `vs` that was checked for null. It will do that in either a multi-threaded or single threaded environment, hence the term `consistent`. The second version can return a different values than was checked for null, or a different `new Values()` that was created on a different thread. I really don't understand why you're complaining so much about this, it is a simple aspect of the code, and maybe not the most relevant. — matt, Dec 12 '19 at 11:30
@MạnhQuyếtNguyễn, nothing fancy here. He is asking what could favour choosing one implementation over another and this consistency behaviour spotted by Matt is definitely something to be aware of when choosing one over another, like it or not. You are free to disagree about its importance in your personal experience, but the difference in behaviour remains nonetheless and the choice of one implementation over another CAN indeed affect the behaviour of a multi-threaded application, making one implementation preferable over the other depending on how the class is going to be used. — Filippo Possenti, Dec 12 '19 at 11:52

T.J. Crowder · Accepted Answer · 2019-12-12T09:44:11.690

Local variables are held on the stack and are very fast, faster than reading a field, even before the JVM optimizes the method (if it bothers to, the method may not be called enough in a given program). In the common case, values is not null, so it's retrieved once, then the local is tested and used as the return value. If you didn't use a local variable, you'd have to read the field twice (once for the null check, once for the return). That isn't as fast as reusing the local.

There may also be a general style rule (related to efficiency) being applied that doesn't necessarily show its benefits in values(), specifically.

Let's look at it in more depth. Consider these two implementations of a values-like method, one using a local and the other using the field everywhere:

private transient Map<T, U> map1 = null;
public Map<T, U> likeValues() {
    Map<T, U> m = this.map1;
    if (m == null) {
        m = new HashMap<T, U>();
        this.map1 = m;
    }
    return m;
}

private transient Map<T, U> map2 = null;
public Map<T, U> usingField() {
    if (this.map2 == null) {
        this.map2 = new HashMap<T, U>();
    }
    return this.map2;
}

Let's look at the bytecode for them, I've added comments (they may not be perfect, I'm not a bytecode expert and moreover I'm going for clarity rather than describing each and every stack operation in detail):

public java.util.Map likeValues();
  Code:
     0: aload_0             // Load `this` onto the stack
     1: getfield      #2    // Get the field's value
     4: astore_1            // Store it in local variable 1
     5: aload_1             // Load local variable 1
     6: ifnonnull     22    // Jump if not null to #22
     9: new           #5    // Create the HashMap
    12: dup                 // Duplicate the top value on the stack
    13: invokespecial #6    // Call method java/util/HashMap."":()V to init the HashMap
    16: astore_1            // Store the result in local variable 1
    17: aload_0             // Load `this` onto the stack
    18: aload_1             // Load local variable 1 onto the stack
    19: putfield      #2    // Store the value in the field
    22: aload_1             // Load local variable 1 onto the stack
    23: areturn             // Return it

public java.util.Map usingField();
  Code:
     0: aload_0             // Load `this` onto the stack
     1: getfield      #3    // Get the field's value
     4: ifnonnull     18    // Jump if not null to #18
     7: aload_0             // Load `this` onto the stack
     8: new           #5    // Create the HashMap
    11: dup                 // Duplicate the top value on the stack                          
    12: invokespecial #6    // Call method java/util/HashMap."":()V to init the HashMap
    15: putfield      #3    // Store it in the field
    18: aload_0             // Load `this` onto the stack
    19: getfield      #3    // Get the field's value
    22: areturn             // Return it

Let's look only at the non-null path in those:

public java.util.Map likeValues();
  Code:
     0: aload_0             // Load `this` onto the stack
     1: getfield      #2    // Get the field's value
     4: astore_1            // Store it in local variable 1
     5: aload_1             // Load local variable 1
     6: ifnonnull     22    // Jump if not null to #22
    // ...skipped...
    22: aload_1             // Load local variable 1 onto the stack
    23: areturn             // Return it

public java.util.Map usingField();
  Code:
     0: aload_0             // Load `this` onto the stack
     1: getfield      #3    // Get the field's value
     4: ifnonnull     18    // Jump if not null to #18
    // ...skipped...
    18: aload_0             // Load `this` onto the stack
    19: getfield      #3    // Get the field's value
    22: areturn             // Return it

In likeValues, the field is only read once, and then the local variable (on the stack) is used for the comparison and return.

In usingField, the field is read twice. Reading a field is not as fast as using a stack value.

If I understand OPs question, he found this code inside of the `HashMap` implementation - it has to be super optimized, doesn't it. — Amongalen, Dec 12 '19 at 08:57
@Amongalen - Right, some methods in the JDK classes are written to be fairly efficient out of the box and very optimizable by the JVM. The JDK wasn't always like that, but implementations have been improved over time. — T.J. Crowder, Dec 12 '19 at 09:00
I think that is the point of this question. On the low level, why did JDK devs decided to use a local variable instead of comparison on the instance variable. For a simple mind, additional assignment to the local variable is nonoptimal. — Amongalen, Dec 12 '19 at 09:04
@MuratKaragöz - You'd have to re-read the field to return it. — T.J. Crowder, Dec 12 '19 at 09:06
@Amongalen - I must not have explained very well above, I thought "local variables are **very** fast" was clear enough. I'll expand. — T.J. Crowder, Dec 12 '19 at 09:07
Yep, from your answer it isn't clear that using a local variable is **faster** than instance variable. Just being "fast" doesn't mean much :) — Amongalen, Dec 12 '19 at 09:09
Might be my stubborness but the differences go beyond speed. The two methods actually behave differently in a multi-threaded environment. The bytecode (and its double use of getfield, in particular) confirms it too. I'm not saying there is no performance benefit (even though I would like to know how much benefit are we talking about) but the behaviour of the two implementations is definitely different and it's actually an important difference when debugging. — Filippo Possenti, Dec 12 '19 at 10:02
@FilippoPossenti - Yes, they do behave differently in a multi-threaded environment, but I don't see any significance in that difference vis-a-vis debugging. Neither is thread-safe; in both, simultaneous calls to `values()` may result in the same or different `Values` instances; in both, a call to `values()` may result in a `Values` instance that the `HashMap` doesn't hold on its `values` field (if another simultaneous call occurs). So yes, they're different, but I'm not seeing any value to that difference. :-) — T.J. Crowder, Dec 12 '19 at 10:13
Looks like a small saving. The only access to `vs` that doesn't have a corresponding access to `values` is the return. And how many times do we call `values()`, compared to, for example the overhead of actually iterating through the values? That said, this is one of the very things that local variables are *for*, caching a value so it doesn't have to be re-read, (or re-computed, in other cases), and I would do this as a matter of habit, unless there was a good reason not to. — Rodney, Dec 12 '19 at 10:29
@T.J.Crowder I do agree that it's a subjective opinion on whether this difference in behaviour is important or not. To me it's important as I'm assuming the objective here is not to re-write the JVM but rather understand the implications of certain choices in every-day programming tasks and the two proposed implementations trade certain features in exchange for others... and this difference in behaviour is one of the tradeoffs, just like performance is another one. — Filippo Possenti, Dec 12 '19 at 11:34
@T.J.Crowder, to further build on what I'm saying in such a way that will add value to your answer: I don't disagree with your assessment on performance, as I don't know the relative speed of the various bytecode instructions. All I'm saying is that your answer could benefit from an addition... namely the different behaviour the two implementations may exhibit in a multi-threading scenario. Basically, I would be happy to be able to delete my answer in favour of yours... but right now I just can't. :) — Filippo Possenti, Dec 12 '19 at 11:42
@FilippoPossenti - No reason to do that, let yours stand. It's another valid answer to the question, after all! :-) — T.J. Crowder, Dec 12 '19 at 11:58

Filippo Possenti · Answer 2 · 2019-12-12T15:07:59.880

Although the two implementations look very similar, there is one major difference.

Let's say you are in a multi-threaded application and Thread 2 changes the reference to values while Thread 1 is executing the values() method. This is as easy as having both threads call the values() method roughly at the same time.

With the first implementation changing such value will have no effect on the value being returned by values() in a given thread as it comes from the vs variable. With the second implementation on the other hand there is a chance that the call to values() will yield something different from the outcome of new Values() as the return value comes from the shared values variable.

In other words, the HashMap version will behave consistently in a multi-threaded environment (note that I didn't call it "thread-safe", as that is normally used with a broader meaning) whereas your version will behave inconsistently, thus potentially causing problems that are very difficult to track down in multi-threaded applications.

Why not use values variable directly, the vs variable is not necessary

2 Answers2