12

As discussed in Eric Lippert's blog post Closing over the loop variable considered harmful, closing over the loop variable in C# can have unexpected consequences. I was trying to understand if the same "gotcha" applied to Scala.

First of all, since this is a Scala question, I'll try explaining Eric Lippert's C# example adding a few comments to his code

// Create a list of integers
var values = new List<int>() { 100, 110, 120 };

// Create a mutable, empty list of functions that take no input and return an int
var funcs = new List<Func<int>>();

// For each integer in the list of integers we're trying
// to add a function to the list of functions
// that takes no input and returns that integer
// (actually that's not what we're doing and there's the gotcha).
foreach(var v in values)
  funcs.Add( ()=>v );

// Apply the functions in the list and print the returned integers.
foreach(var f in funcs)
  Console.WriteLine(f());

Most people expect this program to print 100, 110, 120. It actually prints 120, 120, 120. The issue is that the () => v function we add to the funcs list closes over the v variable, not v's value. As v changes value, in the first loop, all the three closures we add to the funcs list "see" the same variable v, which (by the time we apply them in the second loop) has value 120 for all of them.

I've tried to translate the example code to Scala:

import collection.mutable.Buffer
val values = List(100, 110, 120)
val funcs = Buffer[() => Int]()

for(v <- values) funcs += (() => v)
funcs foreach ( f => println(f()) )
// prints 100 110 120
// so Scala can close on the loop variable with no issue, or can it?

Does Scala indeed not suffer from the same issue or have I just translated Eric Lippert's code badly and have failed to reproduce it?

This behavior has tripped many a valiant C# developer, so I wanted to make sure there are no strange similar gotchas with Scala. But also, once you understand why C# behaves the way it does, the output of Eric Lippert's example code kind of makes sense (it's the way closures work, basically): so what is Scala doing differently?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Paolo Falabella
  • 24,914
  • 3
  • 72
  • 86
  • 2
    `v` isn't a mutable variable in the Scala code. Keep in mind that `for` comprehensions are _not_ `for` loops. The Scala code actually translates to something far more functional in nature than a standard `for` loop, so, where you had one `v` with many values in C# code, you have multiple `v`s that each get their own single value in the Scala code. – Destin Mar 23 '12 at 16:15
  • @Destin: thanks, you should have posted that as an answer. I would have at least upvoted it. (You can still do it, actually) – Paolo Falabella Mar 23 '12 at 16:38

4 Answers4

11

Scala doesn't have the same problem because v is not a var, it's a val. Therefore, when you write

() => v

the compiler understands that it is supposed to produce a function that returns that static value.

If instead you use a var, you can have the same problem. But it's a lot clearer that this is the asked-for behavior, since you explicitly create a var, and then have the function return it:

val values = Array(100, 110, 120)
val funcs = collection.mutable.Buffer[() => Int]()
var value = 0
var i = 0
while (i < values.length) {
  value = values(i)
  funcs += (() => value)
  i += 1
}
funcs foreach (f => println(f()))

(Note that if you try funcs += (() => values(i)) you will get an out of bounds exception because you have closed over the variable i which, when you call, is now 3!)

Rex Kerr
  • 166,841
  • 26
  • 322
  • 407
  • thanks for reproducing the same behavior in scala. Now that I've seen how un-idiomatic the resulting scala would be, I'm confident that (as you say) it's not something likely to happen by accident. – Paolo Falabella Mar 23 '12 at 16:46
6

The close equivalent of the C# example would be with a while loop and a var. It would behave as in C#.

On the other hand, for(v <- values) funcs += (() => v) is translated to values.foreach(v => funcs += () => v)

just to give names, that could be

def iteration(v: Int) = {funcs += () => v)
values.foreach(iteration)

The closure () => v appears in the body of iteration, and what it captures is not some var shared by all iterations, but the argument of the call to iteration, which is not shared, and moreover is a constant value rather than a variable. This prevent the unintuitive behavior.

There may well be a variable in the implementation of foreach, but it is not what the closure sees.

If in C#, you move the body of the loop in a separate method, you get the same effect.

Didier Dupont
  • 29,398
  • 7
  • 71
  • 90
  • eeek.... Two great answers posted at exactly the same time and I can mark only one as the answer! Sorry, I can only give you a +1, but I really appreciate your insight – Paolo Falabella Mar 23 '12 at 16:52
3

Note that Scala's for-comprehension works in a very different manner. This:

for(v <- values) funcs += (() => v)

is translated at compile time into this:

values.foreach(v => funcs += (() => v))

So v is a new variable for each value.

Daniel C. Sobral
  • 295,120
  • 86
  • 501
  • 681
2

If you disassemble the C# example, you'll see that a class to hold the closed variables is generated by the compiler. Reflector renders that class like:

[CompilerGenerated]
private sealed class <>c__DisplayClass2
{
    // Fields
    public int v;

    // Methods
    public int <Main>b__1()
    {
        return this.v;
    }
}

Reflector renders such pretty C#, you can't really see how that class is being used. To see that you need to look at the raw IL.

.method private hidebysig static void Main(string[] args) cil managed
{
    .entrypoint
    .maxstack 4
    .locals init (
        [0] class [mscorlib]System.Collections.Generic.List`1<int32> values,
        [1] class [mscorlib]System.Collections.Generic.List`1<class [mscorlib]System.Func`1<int32>> funcs,
        [2] class ConsoleApplication1.Program/<>c__DisplayClass2 CS$<>8__locals3,
        [3] class [mscorlib]System.Func`1<int32> f,
        [4] class [mscorlib]System.Collections.Generic.List`1<int32> <>g__initLocal0,
        [5] valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator`0<int32> CS$5$0000,
        [6] bool CS$4$0001,
        [7] valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator`0<class [mscorlib]System.Func`1<int32>> CS$5$0002)
    L_0000: nop 
    L_0001: newobj instance void [mscorlib]System.Collections.Generic.List`1<int32>::.ctor()
    L_0006: stloc.s <>g__initLocal0
    L_0008: ldloc.s <>g__initLocal0
    L_000a: ldc.i4.s 100
    L_000c: callvirt instance void [mscorlib]System.Collections.Generic.List`1<int32>::Add(!0)
    L_0011: nop 
    L_0012: ldloc.s <>g__initLocal0
    L_0014: ldc.i4.s 110
    L_0016: callvirt instance void [mscorlib]System.Collections.Generic.List`1<int32>::Add(!0)
    L_001b: nop 
    L_001c: ldloc.s <>g__initLocal0
    L_001e: ldc.i4.s 120
    L_0020: callvirt instance void [mscorlib]System.Collections.Generic.List`1<int32>::Add(!0)
    L_0025: nop 
    L_0026: ldloc.s <>g__initLocal0
    L_0028: stloc.0 
    L_0029: newobj instance void [mscorlib]System.Collections.Generic.List`1<class [mscorlib]System.Func`1<int32>>::.ctor()
    L_002e: stloc.1 
    L_002f: nop 
    L_0030: ldloc.0 
    L_0031: callvirt instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator`0<!0> [mscorlib]System.Collections.Generic.List`1<int32>::GetEnumerator()
    L_0036: stloc.s CS$5$0000
    L_0038: newobj instance void ConsoleApplication1.Program/<>c__DisplayClass2::.ctor()
    L_003d: stloc.2 
    L_003e: br.s L_0060
    L_0040: ldloc.2 
    L_0041: ldloca.s CS$5$0000
    L_0043: call instance !0 [mscorlib]System.Collections.Generic.List`1/Enumerator`0<int32>::get_Current()
    L_0048: stfld int32 ConsoleApplication1.Program/<>c__DisplayClass2::v
    L_004d: ldloc.1 
    L_004e: ldloc.2 
    L_004f: ldftn instance int32 ConsoleApplication1.Program/<>c__DisplayClass2::<Main>b__1()
    L_0055: newobj instance void [mscorlib]System.Func`1<int32>::.ctor(object, native int)
    L_005a: callvirt instance void [mscorlib]System.Collections.Generic.List`1<class [mscorlib]System.Func`1<int32>>::Add(!0)
    L_005f: nop 
    L_0060: ldloca.s CS$5$0000
    L_0062: call instance bool [mscorlib]System.Collections.Generic.List`1/Enumerator`0<int32>::MoveNext()
    L_0067: stloc.s CS$4$0001
    L_0069: ldloc.s CS$4$0001
    L_006b: brtrue.s L_0040
    L_006d: leave.s L_007e
    L_006f: ldloca.s CS$5$0000
    L_0071: constrained. [mscorlib]System.Collections.Generic.List`1/Enumerator`0<int32>
    L_0077: callvirt instance void [mscorlib]System.IDisposable::Dispose()
    L_007c: nop 
    L_007d: endfinally 
    L_007e: nop 
    L_007f: nop 
    L_0080: ldloc.1 
    L_0081: callvirt instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator`0<!0> [mscorlib]System.Collections.Generic.List`1<class [mscorlib]System.Func`1<int32>>::GetEnumerator()
    L_0086: stloc.s CS$5$0002
    L_0088: br.s L_009e
    L_008a: ldloca.s CS$5$0002
    L_008c: call instance !0 [mscorlib]System.Collections.Generic.List`1/Enumerator`0<class [mscorlib]System.Func`1<int32>>::get_Current()
    L_0091: stloc.3 
    L_0092: ldloc.3 
    L_0093: callvirt instance !0 [mscorlib]System.Func`1<int32>::Invoke()
    L_0098: call void [mscorlib]System.Console::WriteLine(int32)
    L_009d: nop 
    L_009e: ldloca.s CS$5$0002
    L_00a0: call instance bool [mscorlib]System.Collections.Generic.List`1/Enumerator`0<class [mscorlib]System.Func`1<int32>>::MoveNext()
    L_00a5: stloc.s CS$4$0001
    L_00a7: ldloc.s CS$4$0001
    L_00a9: brtrue.s L_008a
    L_00ab: leave.s L_00bc
    L_00ad: ldloca.s CS$5$0002
    L_00af: constrained. [mscorlib]System.Collections.Generic.List`1/Enumerator`0<class [mscorlib]System.Func`1<int32>>
    L_00b5: callvirt instance void [mscorlib]System.IDisposable::Dispose()
    L_00ba: nop 
    L_00bb: endfinally 
    L_00bc: nop 
    L_00bd: ret 
    .try L_0038 to L_006f finally handler L_006f to L_007e
    .try L_0088 to L_00ad finally handler L_00ad to L_00bc
}

Inside the first foreach, you can see that only one instance of that class is created. The iterator's values are assigned into that instance's public v field. The funcs list is populated with delegates to that object's b__1 method.

So essentially what's happen in C# is

  1. Create a closure scope object
  2. Iterating over the values...
    1. Push a reference to the closure's accessor function into funcs
    2. Update the closure scope object's v with the current value.
  3. Iterator over funcs. Each call will return the current value of v.
Leif Wickland
  • 3,693
  • 26
  • 43