25

I think I've met this classic situation in JavaScript.

Usually the programmer would expect this code below to print "Peter", "Paul", "Mary".

But it doesn't. Could anyone explain exactly why it works this way in Java?

This Java 8 code compiles OK and prints 3 times "Mary".

I guess it's a matter of how it's implemented deep down
but ... doesn't this indicate a wrong underlying implementation?

import java.util.List;
import java.util.ArrayList;

public class Test008 {

    public static void main(String[] args) {
        String[] names = { "Peter", "Paul", "Mary" };
        List<Runnable> runners = new ArrayList<>();

        int[] ind = {0};
        for (int i = 0; i < names.length; i++){ 
            ind[0] = i;
            runners.add(() -> System.out.println(names[ind[0]]));
        }

        for (int k=0; k<runners.size(); k++){
            runners.get(k).run();
        }

    }

}

On the contrary, if I use an enhanced for loop (while adding the Runnables), the correct (i.e. all the different) values are captured.

for (String name : names){
    runners.add(() -> System.out.println(name));
}

Finally, if I use a classic for loop (while adding the Runnables), then I get a compilation error (which makes perfect sense, as the variable i is not final or effectively final).

for (int i = 0; i < names.length; i++){ 
    runners.add(() -> System.out.println(names[i]));
}

EDIT:

My point is: why is not the value of names[ind[0]] captured (the value it has at the moment I add the Runnables)? It should not matter when I execute the lambda expressions, right? I mean, OK, in the version with the enhanced for loop, I also execute the Runnables later but the correct/distinct values were captured earlier (when adding the Runnables).

In other words, why does not Java always have this by value / snapshot semantics (if I may put it this way) when capturing values? Wouldn't it be cleaner and make more sense?

peter.petrov
  • 38,363
  • 16
  • 94
  • 159
  • 4
    Surely the point here is that you are closing over the `int[]` reference `ind`, not the value of `ind[0]`, so when you change `ind[0]`, the updated value is shared between your lambdas. – Andy Turner Oct 12 '15 at 11:00
  • 1
    What would happen if you create a new Runnable with `new`? Does that change the output like you would expect it? – Clayn Oct 12 '15 at 11:08
  • @Clayn No, if I do new Runnable and implement that Runnable in place, it still has the same effect, "Mary" is printed 3 times. You can try it too. – peter.petrov Oct 12 '15 at 11:13
  • 3
    @peter I wanted you to see that. Thats the thing your closure does (more or less). So thats no special thing with those expressions. ìnd[0]`is an expression and can't be evaluated at that point you expect it to happen. How could the JVM decide which expression to evaluate and which not? – Clayn Oct 12 '15 at 11:14
  • @Clayn Hm, I see. OK, I will think some more about it. Makes sense. But wouldn't it be cleaner if the capturing is done always by value/by snapshot so that the value is captured at the moment we add those Runnables there?! But then... maybe then it will be incompatible with your example, maybe that's the whole thing, the two examples need to behave in the same way. – peter.petrov Oct 12 '15 at 11:16
  • @Clayn "How could the JVM decide which expression to evaluate and which not?" I was saying: can't the JVM use just an 'evaluate all expressions semantics' i.e. assume I want to evaluate everything at the time I reference it. – peter.petrov Oct 12 '15 at 11:19
  • 1
    @peter "I want to evaluate everything at the time I reference it" so you want to print the names in that loop? What if you do something more complex? a reason not to do this may be (i dont know if it is THAT reason), is lazy initialization. I know an example but have to search it first. So i will edit this later – Clayn Oct 12 '15 at 11:21
  • 1
    Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/92036/discussion-between-clayn-and-peter-petrov). – Clayn Oct 12 '15 at 11:27
  • Since it seems you are coming from the JavaScript world, you should notice that the behaviour would actually be [the same in JS](http://jsfiddle.net/L08puwkt/1/). – Didier L Oct 12 '15 at 14:36
  • @DidierL I am not coming from a JS world. I've just seen a similar example in JS in the famous Crockford book. Thanks for the reference though. – peter.petrov Oct 12 '15 at 14:40

6 Answers6

47

In other words, why does not Java always have this by value / snapshot semantics (if I may put it this way) when capturing values? Wouldn't it be cleaner and make more sense?

Java lambdas do have capture-by-value semantics.

When you do:

runners.add(() -> System.out.println(names[ind[0]]));

the lambda here is capturing two values: ind and names. Both of these values happen to be object references, but an object reference is a value just like '3'. When a captured object reference refers to a mutable object, this is where things can get confusing, because their state during lambda capture and their state during lambda invocation may be different. Specifically, the array to which ind refers does change in this way, which is the cause of your problem.

What the lambda does not capture is the value of the expression ind[0]. Instead, it captures the reference ind, and, when the lambda is invoked, performs the dereference ind[0]. Lambdas close over values from their lexically enclosing scope; ind is a value in the lexically enclosing scope, but ind[0] is not. We capture ind and use it further at evaluation time.

You are somehow expecting here that the lambda will do a full snapshot of all objects in the entire heap that are reachable through the captured references, but that's not how it works -- nor would that make a lot of sense.

Summary: Lambdas capture all captured arguments -- including object references -- by value. But an object reference and the object to which it refers are not the same thing. If you capture a reference to a mutable object, the object's state may have changed by the time the lambda is invoked.

Brian Goetz
  • 90,105
  • 23
  • 150
  • 161
  • 2
    I think this is the cleanest explanation here. Now I really have the feeling I understand it in depth. It makes perfect sense now. I will re-read it a few more times later. Thanks a lot. – peter.petrov Oct 12 '15 at 11:49
  • "lambdas do have capture-by-value semantics" In a way that's similar to parameter passing in Java, I guess. One may think of it this way, I guess. Well, I have no problem with the concept of parameter passing, but this question here got me a bit confused and thinking. Now I get it. – peter.petrov Oct 12 '15 at 11:53
  • 2
    @peter.petrov Yes, exactly. Just as object references (and everything else) are passed by value, object references (and everything else) are captured by value. – Brian Goetz Oct 12 '15 at 11:55
  • 1
    @peter.petrov You should also see how adopting functional ideas and paradigms without taking the whole thing is tricky. Of course, this will never happen in a strict functional code, because all functions would be pure and all the values would be immutable. Always be careful when using functional-like code with mutable values and impure functions, or you can get burnt easily - it may be fine under some conditions (e.g. if you didn't delay executing the lambda), but will break under others (as in your case). – Luaan Oct 12 '15 at 15:51
  • "Java lambdas *do* have capture-by-value semantics." - oh god, when you said that, I thought there'd been some change that makes it actually matter. (It still doesn't matter, right? There's nothing where the language would actually behave differently if lambdas captured variables instead of values?) – user2357112 Oct 13 '15 at 04:15
  • 2
    @user2357112 As long as captured variables are required to be `final`, it makes no difference whether the variables themselves or their current values are captured. – user253751 Oct 13 '15 at 07:15
14

I would say the code does exactly what you tell him to do.

You create "Runnable"'s which print the name at the position from ind[0]. But that expression gets evaluated in your second for-loop. And at this point ind[0]=2. So the expression prints "Mary".

Andy Turner
  • 137,514
  • 11
  • 162
  • 243
Clayn
  • 1,016
  • 9
  • 11
10

When you create a lambda expression, you are not executing it. It's only executed in your second for loop, when you invoke the run method of each Runnable.

When the second loop is executed, ind[0] contains 2, therefore "Mary" is printed in the execution of all the run methods.

EDIT:

The enhanced for loop behaves differently because in that snippet the lambda expression holds a reference to a String instance, and String is immutable. If you change String to StringBuilder, you can build an example with an enhanced for loop that also prints the final value of the instances being referenced :

StringBuilder[] names = { new StringBuilder().append ("Peter"), new StringBuilder().append ("Paul"), new StringBuilder().append ("Mary") };
List<Runnable> runners = new ArrayList<>();

for (StringBuilder name : names){
  runners.add(() -> System.out.println(name));
  name.setLength (0);
  name.append ("Mary");
}

for (int k=0; k<runners.size(); k++){
  runners.get(k).run();
}

Output :

Mary
Mary
Mary
Eran
  • 387,369
  • 54
  • 702
  • 768
  • OK, sure, but my point is: why is not the value of `names[ind[0]]` captured (the value it has at the moment I add the Runnables)? It doesn't matter when I execute the lambda expression. In the enhanced for loop I also execute the Runnables later but the correct values are captured. – peter.petrov Oct 12 '15 at 11:02
  • 3
    @peter.petrov Why should it? Its the same as using `new Runnable(){System.out.println(names[ind[0]]);};` That should cause the same output – Clayn Oct 12 '15 at 11:04
  • 1
    @peter.petrov In the enhanced for loop you are referring directly to String references (held by the `name` variable). If instead of `String`s you would `println` some mutable class instances (which you would later mutate before executing the run methods), the run methods will print the final state of the instances, not the state at the time the lambda was created. – Eran Oct 12 '15 at 11:12
  • 1
    @peter.petrov The evaluation of the argument of the println is part of the lambda expression - think of it as "source code on ice". Only running it accesses ind[0]. – laune Oct 12 '15 at 11:13
  • OK, guys, I see perfectly what you're saying. I will think about it once again later. Hopefully by then the question will receive some more answers too. – peter.petrov Oct 12 '15 at 11:15
  • Thanks, the example with the StringBuilders is good. So even though you modify them after adding the Runnables, you still get Mary printed 3 times. Interesting. OK. – peter.petrov Oct 12 '15 at 11:34
6

it is working perfectly .. while calling the println method the ind[0] is 2 as you are using a common variable and increasing the value of it before the function call. thus it will always print Mary. You can do following instead to check

    for (int i = 0; i < names.length; i++) {
        final int j = i;
        runners.add(() -> System.out.println(names[j]));
    }

this will print all the names as desired

or declare the ind locally

    for (int i = 0; i < names.length; i++) {
        final int[] ind = { 0 };
        ind[0] = i;
        runners.add(() -> System.out.println(names[ind[0]]));
    }
stinepike
  • 54,068
  • 14
  • 92
  • 112
  • I don't think the `final int j = i;` is necessary but i'm not quite sure and can't test it now – Clayn Oct 12 '15 at 11:05
4

Another way to explain this behavior is to look at what the lambda expression replaces:

runners.add(() -> System.out.println(names[ind[0]]));

is syntactic sugar for

runers.add(new Runnable() {
    final String[] names_inner = names;
    final int[] ind_inner = ind;
    public void run() {
        System.out.println(names_inner[ind_inner[0]]));
    }
});

which turns into

// yes, this is inside the main method's body
class Test008$1 implements Runnable {
    final String[] names_inner;
    final int[] ind_inner;

    private Test008$1(String[] n, int[] i) {
        names_inner = n; ind_inner = i;
    }

    public void run() {
        System.out.println(names_inner[ind_inner[0]]));
    }
}
runers.add(new Test008$1(names,ind));

(The names of the generated stuff doesn't really matter. That dollar sign is just a character often used in the names of generated methods/classes in Java - That's why it is reserved. )

The two inner fields are added, so that the code inside the Runnable can see the variables outside of it; if names and ind were defined as final in the outside code (main method), the inner fields wouldn't be necessary. This is so that the variables can be passed by value, as Brian Goetz explains in his answer.

Arrays are like objects, in that if you pass them to a method that modifies them, they will be modified in the original location too; the rest is just OOP basics. So this is in fact the correct behavior.

Nulano
  • 1,148
  • 13
  • 27
  • Thanks. Out of curiosity: how did you get to this generated code, did you use some tool? – peter.petrov Oct 12 '15 at 18:30
  • 1
    @peter.petrov The middle one is just how you had to do this before lambdas were added (before Java 8) and the last one is mostly just an assumption (it's the middle one converted from an anonymous class to a non-anonymous class). Also, I have decompiled anonymous classes before, and if I remember rightly, this is pretty much what you get (the last one). – Nulano Oct 13 '15 at 16:43
2

Well, before lambda, Java used to have a more strict rule for capturing local/parameter variables in anonymous classes. That is, only final variables are allowed to be captured. I believe this is to prevent concurrency confusions - final variables are not going to change. Therefore if you pass some instances created this way into a task queue and have it executed by another thread, at least you know the two threads are sharing the same value. And a deeper reason is explained by Why are only final variables accessible in anonymous class?

Now, with lambdas, and Java 8, you can capture variables that are "effectively final", which are variables that are not declared final but are considered final by the compiler according to how it's read and written. Basically if you add the final keyword to the declaration of an effectively final variable, the compiler won't complain about it.

Community
  • 1
  • 1
xiaofeng.li
  • 8,237
  • 2
  • 23
  • 30
  • 2
    You can think of this relaxation of the rules as just another place where we extended the use of type inference -- the compiler _infers_ the finality of local variables or method parameters captured in lambdas / inner classes, rather than relying on a manifest declaration of finality. – Brian Goetz Oct 12 '15 at 11:58