10

I'm new to programming and Java. I've noticed that, in the Java API, there are methods with strange assignments inside if statements.

Here is an example from the Map interface:

default V replace(K key, V value) {
    V curValue;
    if (((curValue = get(key)) != null) || containsKey(key)) {
        curValue = put(key, value);
    }
    return curValue;
}

Is there some sort of benefit to nesting the assignment this way? Is this purely a style choice? Why not just do the assignment when curValue is first declared?

// why not do it like this?
default V replace(K key, V value) {
    V curValue = get(key); // not nested
    if (curValue != null || containsKey(key)) {
        curValue = put(key, value);
    }
    return curValue;
}

I've noticed this in a lot of the newly added Java 8 methods in the Map interface and elsewhere. This form of nesting the assignment seems unnecessary.

Edit: another example from the Map interface:

default V computeIfAbsent(K key,
        Function<? super K, ? extends V> mappingFunction) {
    Objects.requireNonNull(mappingFunction);
    V v;
    if ((v = get(key)) == null) {
        V newValue;
        if ((newValue = mappingFunction.apply(key)) != null) {
            put(key, newValue);
            return newValue;
        }
    }

    return v;
}
Vince
  • 14,470
  • 7
  • 39
  • 84
Ivan
  • 239
  • 1
  • 13
  • Usually it's good practice to initialize variables when they're needed rather than when they're declared. However, this example doesn't seem to provide any benefit, so it was most likely personal preference. – Jacob G. Feb 07 '18 at 00:23
  • 5
    @JacobG. *Usually it's good practice to initialize variables when they're needed rather than when they're declared.* It's considered good practice to only declare variables where they're needed, so that's kind of a moot point. – shmosel Feb 07 '18 at 00:27
  • @shmosel I agree for the most part. However, if the conditions were swapped in this if-statement and the initialization of `curValue` was expensive, then I'd prefer initializing it when its condition has been reached rather than before the `if` block (in the case of a short-circuit where `curValue` wouldn't have to have been initialized). – Jacob G. Feb 07 '18 at 00:30
  • @JacobG. Right. That's one of the only times I find myself using this pattern. – shmosel Feb 07 '18 at 00:33
  • 1
    I see that this pattern compiles into a dup+store, whereas initializing it first compiles into a store+load. Does anyone want to benchmark whether that JITs differently? – that other guy Feb 07 '18 at 00:39
  • 1
    `curValue` will not maintain the value returned by `get(key)`. Instead, the value it really exists for is the returned result of `put`. It seems the dev wanted to emphasize this. It seems they're emphasizing the fact that `get` is only used for the null check, and that the value it may return is redundant. – Vince Feb 07 '18 at 00:39
  • 3
    They're not 'unncessary', whether 'seemingly' or otherwise. They're just in a different place from what you're 'seemingly' expecting. – user207421 Feb 07 '18 at 00:40
  • 1
    No benefit, it’d be better to do that assignment right away. But sometimes people prefer showing off instead of writing clean and readable code. – algrid Feb 07 '18 at 00:41
  • 1
    Isn't that the same as `return containsKey(key) ? put(key, value) : null`? Maybe optimization reasons? – giorgiga Feb 07 '18 at 01:17
  • 1
    @VinceEmigh that's a good theory. But what about when the initial return value is used outside the if statement? Such as in http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/Map.java#l586 or in `computeIfAbsent` in the same interface. – Ivan Feb 07 '18 at 01:29
  • 1
    @giorgiga I suppose, it stems from copy&paste&adapting another operation; I see no advantage of that complex code over your variant. But for most relevant `Map` types, this `default` method has been overridden anyway. – Holger Feb 07 '18 at 13:13
  • @Holger now that I think of it, it may be to not break (erroneous, but common) "default map" implementations - eg: https://stackoverflow.com/a/4833375/428627 – giorgiga Feb 07 '18 at 13:17
  • @giorgiga even then, you don’t need to deviate much from your simplification: `return get(key)!=null || containsKey(key)? put(key, value): null;`, the key point is that the assignment of the question is entirely obsolete as its result is only used when it is known to be `null`… – Holger Feb 08 '18 at 12:39
  • Even stranger, [`putIfAbsent` doesn't use this idiom](http://hg.openjdk.java.net/jdk8/jdk8/jdk/rev/e4e9f6455f3c#l4.196). It seems [@MikeDuigo](https://stackoverflow.com/users/1088831/mike-duigou) was the one to commit this. Maybe he could shine some light on this? – Vince Feb 10 '18 at 16:52
  • Is this question related to https://stackoverflow.com/questions/28975415/why-jdk-code-style-uses-a-variable-assignment-and-read-on-the-same-line-eg-i ? – Ivan Feb 25 '18 at 18:52

2 Answers2

3

What this is doing is actually copying to a local variable, this is producing smaller byte code, and it is seen as an absolute extreme way of optimization, you will see this in numerous other places in the jdk code.

One other thing is that reading a local variable multiple times, implies reading a shared variable only once, if that for example would have been a volatile and you would read it only once and work with it within the method.

EDIT

The difference between the two approaches is a single read AS FAR AS I CAN TELL

Suppose we have these two methods:

V replace(K key, V value) {
    V curValue;
    if ((curValue = map.get(key)) != null || map.containsKey(key)) {
        curValue = map.put(key, value);
    }
    return curValue;
} 

V replaceSecond(K key, V value) {
    V curValue = map.get(key); // write
    if (curValue != null || map.containsKey(key)) { // read
        curValue = map.put(key, value); // write
    }
    return curValue;
}

The byte code for this is almost identical, except for: replaceSecond is going to have:

 astore_3 // V curValue = map.get(key); store to curValue
 aload_3  // curValue != null; read the value from curValue

While the replace method is going to be:

 dup      // duplicate whatever value came from map.get(key)
 astore_3 // store the value, thus "consuming" it form the stack

In my understanding, dup does not count as yet another read, so I guess this is what is referred as an extreme optimization?

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • Hi @Eugene! I was about to answer about locality, but you wrote this first. Maybe you want to expand with something about locality, why it's good (I believe its main benefit is that it produces more cache hits at the instruction level, but I might be wrong here, you'd need to confirm). I think that code inside methods that have good locality is more easily predictable, and by this I mean that compilers can predict what parts to optimize aggressively, but again, I'm not sure about anything of this. Just to give you some background that is arising from the depths of my memory... – fps Feb 07 '18 at 13:22
  • Disregard my previous comment. Locality is maintained, both now and if get(key) was invoked when curValue is being declared. – fps Feb 07 '18 at 13:27
  • What do you mean by "copying to a local variable"? What does the assignment outside of the if statement do instead? – Ivan Feb 08 '18 at 03:35
  • @Ivan *outside* of the if statement? I don't get it, are you not showing the entire code may be? As far as copying to a local variable goes, you are operating only with a local variable `curValue` instead of a `shared` variable `map.get(....)`; that's common pattern in multi-thread code btw, where you would read once and operate on the value that you have read, operation with it *on the stack* – Eugene Feb 08 '18 at 08:11
  • @Ivan and here may be is a better way to express the same words (I might have a slow morning today)... https://stackoverflow.com/questions/37776179/is-it-important-to-copy-a-reference-to-a-local-variable-before-using-it – Eugene Feb 08 '18 at 08:20
  • Reading a local variable **never** implies reading a shared variable, as local variables are unshared, by definition. That’s why it is impossible to declare a local variable as `volatile`, it wouldn’t make any sense. – Holger Feb 08 '18 at 12:20
  • @Holger veeeery bad wording from my side :( in this example I meant, unless you declare the local variable, it will imply reading the shared one – Eugene Feb 08 '18 at 12:33
  • This doesn't explain why they nested the assignment. Could have done `V curValue = map.get(key);` – Vince Feb 08 '18 at 14:28
  • Agree with @VinceEmigh. How does this explain the nested assignment? – Ivan Feb 09 '18 at 17:24
  • @VinceEmigh I only took a guess here, I hope I am right (see edit) – Eugene Feb 13 '18 at 15:02
  • Yes, it's correct, it's what the other answer was focusing on. This could be why they did it, seeing how these operations may be used quite often in large-scale applications. Although it begs the question: why didn't they apply this to `putIfAbsent`? Maybe that operation isn't used as often, clarity was prioritized in that case. Point is, at least this answer now focuses on the primary question. +1 – Vince Feb 14 '18 at 01:08
  • As said in [my previous comment](https://stackoverflow.com/questions/48654075//48658606#comment84367963_48658606) local variables are not shared, hence, it is entirely irrelevant how often they are read. For practical implications, consider [this answer](https://stackoverflow.com/a/48642917/2711488). Since this read can be encoded as `aload_3` rather than `aload n`, there isn’t even a difference in bytecode size here. – Holger Feb 14 '18 at 07:36
  • @Holger thank you, so this is entirely useless here... :) – Eugene Feb 14 '18 at 08:21
  • @Holger From what I've been reading (haven't dumped the bytecode myself yet), the `aload` instruction is replaced by a `dup` instruction. Wouldn't a `dup` instruction be cheaper, since the compiler is simply duplicating a value that'll already exist on the stack at that point? – Vince Mar 07 '18 at 00:34
  • @VinceEmigh both operations just perform a copy operation within the same stack frame, but that would matter only for an interpreter that truly follows that model. Compilers never actually execute these transfer instructions. They only use them to determine the actual source and use of data (which is the same in both cases). See [Data-flow analysis](https://en.wikipedia.org/wiki/Data-flow_analysis) and [Static single assignment form](https://en.wikipedia.org/wiki/Static_single_assignment_form) (which is used by the HotSpot JVM… – Holger Mar 07 '18 at 08:31
2

There is a almost no difference in the generated bytecode (One instruction difference): https://www.diffchecker.com/okjPcBIb

I wrote this to generate the instructions and pretty print them:

package acid;

import jdk.internal.org.objectweb.asm.ClassReader;
import jdk.internal.org.objectweb.asm.tree.ClassNode;
import jdk.internal.org.objectweb.asm.tree.InsnList;
import jdk.internal.org.objectweb.asm.util.Printer;
import jdk.internal.org.objectweb.asm.util.Textifier;
import jdk.internal.org.objectweb.asm.util.TraceMethodVisitor;

import java.io.IOException;
import java.io.InputStream;
import java.io.PrintWriter;
import java.io.StringWriter;
import java.util.Arrays;

public class Acid {
    public interface Map<K,V> {
        default V replace(K key, V value) {
            V curValue;
            if (((curValue = get(key)) != null) || containsKey(key)) {
                curValue = put(key, value);
            }
            return curValue;
        }

        boolean containsKey(Object key);
        V get(Object key);
        V put(K key, V value);
    }


    public void print() {

        try {
            ClassNode node = loadRelativeClassNode(Map.class.getName());
            node.methods.stream().filter(m -> m.name.equals("replace")).forEach(m -> {

                System.out.println("\n\nMethod: " + m.name + "" + m.desc + "\n");
                System.out.println("-------------------------------\n");

                Printer printer = new Textifier();
                TraceMethodVisitor visitor = new TraceMethodVisitor(printer);
                Arrays.stream(m.instructions.toArray()).forEachOrdered(instruction -> {
                    instruction.accept(visitor);
                    StringWriter writer = new StringWriter();
                    printer.print(new PrintWriter(writer));
                    printer.getText().clear();
                    System.out.print(writer.toString());
                });
            });

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    //Usage: `loadJVMClassNode("java.util.Map")`
    private static ClassNode loadJVMClassNode(String cls) throws IOException, ClassNotFoundException {
        ClassLoader loader = ClassLoader.getSystemClassLoader();
        Class clz = loader.loadClass(cls);
        InputStream url = clz.getResourceAsStream(clz.getSimpleName() + ".class");
        ClassNode node = new ClassNode();
        ClassReader reader = new ClassReader(url);
        reader.accept(node, ClassReader.SKIP_DEBUG | ClassReader.SKIP_FRAMES);
        return node;
    }

    //Usage: `loadJVMClassNode(Acid.Map.class.getName())`
    private static ClassNode loadRelativeClassNode(String cls) throws IOException, ClassNotFoundException {
        ClassLoader loader = ClassLoader.getSystemClassLoader();
        Class clz = loader.loadClass(cls);
        InputStream url = clz.getResourceAsStream(("./" + clz.getName() + ".class").replace(clz.getPackage().getName() + ".", ""));
        ClassNode node = new ClassNode();
        ClassReader reader = new ClassReader(url);
        reader.accept(node, ClassReader.SKIP_DEBUG | ClassReader.SKIP_FRAMES);
        return node;
    }
}

Usage: new Acid().print();

Output Difference is a single DUP instruction vs. ALOAD instruction..

For those that say.. well your interface isn't the Java JDK's interface.. I also did a diff: https://www.diffchecker.com/zBVTu7jK .

I'm very confident JIT will see the them as the exact same code regardless of whether you initialize the variable outside the if-statement or within it..

All code above was ran on:

java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

OSX High-Sierra 10.13.3
IDE: Intelli-J

All in all, it's personal preference..

Brandon
  • 22,723
  • 11
  • 93
  • 186
  • Performance isn't the only factor here. *Why* would someone prefer this style of nested assignments, which only appear in SOME areas? Some may believe it's for performance, which you're saying there seems to no apparent performance gain. These methods were comitted by the same developer, so the question still remains: what's the deal? It could be a design/readability concern we haven't noticed yet, or it could be some deep-down optimization that you haven't unearthed with your tests. May wanna analyze the JITed output to truly confirm your "personal preference" claim. – Vince Feb 10 '18 at 17:19
  • (Continued) Not to mention, some devs are aware of future updates, writing their code in preparation for the next update. Some Java 8 features were pushed back, so this could be something which benefits in Java 9. If you care to fully debunk the relation to performance, please try Java 9 & upload JIT logs (or even just use JITWatch). I'd understand if you don't want to, would be excess work for potentially the same answer. I just don't feel analyzing only the bytecode for one version really does true justice for this mystery. – Vince Feb 10 '18 at 17:25
  • @VinceEmigh; You think that assigning a variable outside vs. inside will cause a deep down optimization that isn't unearthed in the instructions but only in the JIT? It's an assignment. At the end of the function it returns the result. That's it. There's nothing more to it. There is even other languages that we do this in: https://stackoverflow.com/questions/151850/why-would-you-use-an-assignment-in-a-condition It's a style that is inherent from C and C++. There is one case where it would be optimized out and that is short-hand evaluation.. – Brandon Feb 10 '18 at 18:13
  • For example: `if (condition || ((err = someFunc()) == null))`.. Then `condition` is true, `someVar` would be unassigned/null because of short-circuit evaluation: https://en.wikipedia.org/wiki/Short-circuit_evaluation But that is the only optimization that could happen and requires conditions before the assignment. To not lose such a thing, many programmers adopted the style of putting assignments inside the if-statement. C99 recently allowed declarations inside if-statements as well. But that's about it. – Brandon Feb 10 '18 at 18:18
  • There is also one other case.. `Repetition`.. You don't have to duplicate assignment inside loops. – Brandon Feb 10 '18 at 18:25
  • 1
    You seem to like baroque code structure ;^). E.g. in `loadRelativeClassNode`, you can pass the `cls` parameter directly to the `ClassReader`, making the first three (nontrivial) statements obsolete. Also `Arrays.stream(m.instructions.toArray()).forEachOrdered(instruction -> {` can be replaced with `m.instructions.iterator().forEachRemaining(instruction -> {`. But actually, the entire thing you’re doing there can be simplified to `Printer printer = new Textifier(); m.instructions.accept(new TraceMethodVisitor(printer)); printer.getText().forEach(System.out::print);`… – Holger Feb 14 '18 at 13:32
  • …and when you remove the unneeded parts of `loadRelativeClassNode` and `loadJVMClassNode`, there is no difference between these two methods. – Holger Feb 14 '18 at 13:49
  • @Holger; You can't just pass "cls" parameter to the reader. It doesn't always find it. It throws a "ClassNotFound" exception if I do that on my system. Hence why I explicitly constructed the class' path. 10/10 did not think about using iterator though. +1 for that. Also did not think about using `Textifier`. Good looking out :D – Brandon Feb 14 '18 at 14:13
  • Since you are using `ClassLoader.getSystemClassLoader()` explicitly, I can’t imagine any scenario, where this succeeds while `ClassReader`’s constructor, using `ClassLoader.getSystemResourceAsStream` would fail. Besides that, you are using `Textifier`, but not its visitor role. When using the Visitor API consistently, the entire code becomes even smaller, less than twenty lines for everything… – Holger Feb 14 '18 at 14:24