8

I was under the impression that lambda expression contexts in C# contain references to the variables of the parent function scope that are used in them. Consider:

public class Test
{
    private static System.Action<int> del;

    public static void test(){
        int i = 100500;
        del = a => System.Console.WriteLine("param = {0}, i = {1}", a, i);
        del(1);

        i = 10;
        del(1);

    }

    public static void Main()
    {
        test();
    }
}

outputs

param = 1, i = 100500
param = 1, i = 10

However, if this was true, the following would be illegal, because the lambda context would reference a local variable that went out of scope:

public class Test
{
    private static System.Action<int> del;

    public static void test(){
        int i = 100500;
        del = a => System.Console.WriteLine("param = {0}, i = {1}", a, i);
    }

    public static void Main()
    {
        test();
        del(1);
    }
}

However, this compiles, runs and outputs

param = 1, i = 100500

Which means that either something weird is going on, or the context keeps values of the local variables, not references to them. But if this was true, it would have to update them on every lambda invokation, and I don't see how that would work when the original variables go out of scope. Also, it seems that this could incur an overhead when dealing with large value types.

I know that, for example, in C++, this is UB (confirmed in answer to this question).

The question is, is this well-defined behaviour in C#? (I think C# does have some UB, or at least some IB, right?)

If it is well-defined, how and why does this actually work? (implementation logic would be interesting)

Community
  • 1
  • 1
  • 1
    [Lambda Expressions](https://msdn.microsoft.com/en-gb/library/bb397687.aspx): "Lambdas can refer to outer variables (...) that are in scope in the method that defines the lambda function, or in scope in the type that contains the lambda expression. **Variables that are captured in this manner are stored for use in the lambda expression** even if the variables would otherwise go out of scope and be garbage collected ..." (My emphasis) – Damien_The_Unbeliever Mar 20 '15 at 08:07
  • This should answer your question: http://stackoverflow.com/questions/9591476/are-lambda-expressions-in-c-sharp-closures – Robin Mar 20 '15 at 08:10
  • @Damien_The_Unbeliever: yep, thanks, seems like the behaviour is defined, which resolves part of the question. However, I still would like to know the implementation logic =) –  Mar 20 '15 at 08:12
  • 2
    Skeet's [article about closures](http://csharpindepth.com/articles/chapter5/closures.aspx) goes a bit more in-depth. Also, this answer contains some info about the actual generated code: http://stackoverflow.com/a/7142956/996081 – cbr Mar 20 '15 at 08:12
  • @GrawCube: thanks, that answers another part of my question! This (I think) means that the closure stores values, not references, which get updated when the corresponding variable in the enclosing scope is modified (did I get this right?). Looks like the compiler simply doesn't care about the overhead... –  Mar 20 '15 at 08:27
  • 2
    What happens is that a class is generated by the compiler and the variable you're using is "lifted" onto this class. Internally in your method, a new instance of that class is constructed, and all usages of that variable in that method goes through that instance. The actual delegate you declare is also lifted onto this class as a normal method. – Lasse V. Karlsen Mar 20 '15 at 08:28
  • 1
    As such it does not capture *values*, it captures *variables*. – Lasse V. Karlsen Mar 20 '15 at 08:29
  • @LasseV.Karlsen: thanks, this wraps it all up nicely! –  Mar 20 '15 at 08:33

1 Answers1

12

The concept of closures as they relate to the lambda syntax in C# is a very large topic and too large for me to cover everything in just this answer but let's try to answer the specific question here at least. The actual answer is at the bottom, the rest between is background needed to understand the answer.


What happens when the compiler tries to compile a method using anonymous methods is that it rewrites the method to some extent.

Basically, a new class is generated and the anonymous method is lifted into this class. It's given a name, albeit an internal one, so for the compiler it sort of transitions from an anonymous method into a named method. You, however, doesn't have to know or handle that name.

Any variables that this method required, variables that was declared besides the anonymous method, but in the same method that used/declared the anonymous method, will be lifted as well, and then all usages of those variables is rewritten.

There's a couple of methods involved here now so it becomes hard to read the above text so instead let's do an example:

public Func<int, int> Test1()
{
    int a = 42;
    return value => a + value;
}

This method is rewritten to something like this:

public Func<int, int> Test1()
{
    var dummy = new <>c__DisplayClass1();
    dummy.a = 42;
    return dummy.<Test1>b__0;
}

internal class <>c__DisplayClass1
{
    public int a;
    public int <Test1>b__0(int value)
    {
        return a + value;
    }
}

The compiler can handle all these funky names (and yes, they really are named with all the brackets like that) because it refers to things with id's and object references, the names are no longer an issue for the compiler. You, however, can never declare a class or a method with those names so there's no risk of the compiler generating a class that just happens to already exist.

Here's a LINQPad example that shows that a class I declared, although with less brackets in its names, looks identical to the one generated by the compiler:

void Main()
{
    var f1 = Test1();
    f1(10).Dump();
    f1.Dump();

    var f2 = Test2();
    f2(10).Dump();
    f2.Dump();
}

public Func<int, int> Test1()
{
    int a = 42;
    return value => a + value;
}

public Func<int, int> Test2()
{
    var dummy = new __c__DisplayClass1();
    dummy.a = 42;
    return dummy._Test2_b__0;
}

public class __c__DisplayClass1
{
    public int a;
    public int _Test2_b__0(int value)
    {
        return a + value;
    }
}

output:

LINQPad output

If you look at the screenshot above you notice two things for each delegate variable, a Method property, and a Target property.

When calling the method, it is called with a this reference referring to the Target object. A delegate thus captures two things: Which method to call, and the object on which to call it.

So basically, that object of that generated class survives as part of the delegate because it is the target of the method.


With all that in mind, let's look at your question:

Why does a lambda expression preserve enclosing scope variable values after method terminates?

A: If the lambda survives, all the captured variables survive as well because they're no longer local variables of the method they were declared in. Instead they were lifted onto a new object that also has the lambda method, and thus "follows" the lambda everywhere it goes.

Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825