4

I seem to have run into some odd behavior of the C# compiler.

Consider the following code sample:

static void Main(string[] args)
{
    Foo(false, 8);
}

public static void Foo(bool execute, int x)
{
    if (execute)
    {
        Task.Run(() => Console.WriteLine(x));
    }
}

Running this (in release) shows some unexpected allocations happening. Examining the IL shows that that the heap allocation triggered by the closure appears at the very start of the function, rather than inside the condition:

  .method public hidebysig static void 
    Foo(
      bool execute, 
      int32 x
    ) cil managed 
  {
    .maxstack 2
    .locals init (
      [0] class Test.Program/'<>c__DisplayClass1_0' 'CS$<>8__locals0'
    )

    IL_0000: newobj       instance void Test.Program/'<>c__DisplayClass1_0'::.ctor()
    IL_0005: stloc.0      // 'CS$<>8__locals0'
    IL_0006: ldloc.0      // 'CS$<>8__locals0'
    IL_0007: ldarg.1      // x
    IL_0008: stfld        int32 Test.Program/'<>c__DisplayClass1_0'::x

    // [18 13 - 18 25]
    IL_000d: ldarg.0      // execute
    IL_000e: brfalse.s    IL_0022

    // [20 17 - 20 54]
    IL_0010: ldloc.0      // 'CS$<>8__locals0'
    IL_0011: ldftn        instance void Test.Program/'<>c__DisplayClass1_0'::'<Foo>b__0'()
    IL_0017: newobj       instance void [mscorlib]System.Action::.ctor(object, native int)
    IL_001c: call         class [mscorlib]System.Threading.Tasks.Task [mscorlib]System.Threading.Tasks.Task::Run(class [mscorlib]System.Action)
    IL_0021: pop          

    // [22 9 - 22 10]
    IL_0022: ret          

  } // end of method Program::Foo

Am I missing something here, does anyone has an explanation for this strange behavior? Is it possible that Roslyn generates code which allocates for closures regardless of whether we actually execute them?

Shay Rojansky
  • 15,357
  • 2
  • 40
  • 69
  • 1
    "some unexpected allocations happening" - could also mean the expectations were wrong. – H H Jan 06 '17 at 14:19

2 Answers2

10

This behavior is by design.

When your method has a closure, all variables used inside the closure must be part of the closure class (so that the lambda can access their current values).

Had the compiler not allocated the closure immediately, it would have to copy the values from local variables to fields on the closure class when the closure instance is created, wasting time and memory.

That would also make the codegen much riskier and more complicated if multiple lambdas with different reachabilities (or, worse, nested scopes) close over the same variables.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • Closures obviously needs to close on the variables they use - that part I understand. What I don't understand is what operation happens at the very beginning of the method which cannot be deferred to later, when the method is actually used... – Shay Rojansky Jan 06 '17 at 14:03
  • It also seems the local variable value is copied in the code at the beginning of the method, so I don't understand what additional copy you're referring to in your second sentence. – Shay Rojansky Jan 06 '17 at 14:04
  • 2
    Copying the values from a local to a closure class wouldn't just be expensive, it wouldn't be semantically correct. The idea of closures is that you *close over variables, not values*, so if you copied the values, then changes to the local variable outside the closure wouldn't be observed, and it wouldn't observe changes made in the closure. – Servy Jan 06 '17 at 14:15
  • @ShayRojansky: No; the local variables only exist in the closure, so there is nothing to copy (except for parameters). – SLaks Jan 06 '17 at 14:21
  • And, Servy is quite right; if you modify `x` after the `if`, the codegen would need to check whether the closure exists yet to figure out what to write to, which is horrible. – SLaks Jan 06 '17 at 14:21
  • @Slaks I do understand that, I'm still not clear on what exactly is happening at the method start that cannot be deferred. In other words, why would a later allocation mean that local variable values would have to be copied. – Shay Rojansky Jan 06 '17 at 14:24
  • @ShayRojansky: You're asking all locals to exist twice – once as a local and once in the closure class. – SLaks Jan 06 '17 at 14:33
  • Sorry for being dense, I got it. Although in my specific code sample above, couldn't the compiler detect that the parameter isn't actually used until an inner scope? I realize this may not be a worthy optimization (though not sure), am mainly wanting to make sure I understand. – Shay Rojansky Jan 06 '17 at 14:39
  • @SLaks And of course, if there are multiple closures, it would mean N+1 copies to handle all of those closures, and all of those values need to be in sync with each other, and the variable assignments need to be atomic (for types that support it anyway) and so on. Clearly untenable. – Servy Jan 06 '17 at 19:29
1

As stated by SLacks, this behavior is by design, since x is a parameter to the function.

However, the allocation can be "moved into" the condition as follows:

public static void Foo(bool execute, int x)
{
    if (execute)
    {
        int localx = x;
        Task.Run(() => Console.WriteLine(localx));
    }
}

In this specific scenario, the transformation is safe because x is not modified within the body of Foo, nor in the lambda. Also, the if statement is not executed within a loop (in which case the transformation might actually increase the number of allocations). The compiler does not make that analysis for you, but you can.

Kris Vandermotten
  • 10,111
  • 38
  • 49
  • Thanks, yes, that's how I ended up writing things. I'm mostly surprised that the compiler doesn't do this for me, though, and am interested in knowing if there's a reason preventing it from doing so or not. After all, it's performing the lexical analysis to know the highest lexical scope where the variable is *defined* (in my case the method body because x is a parameter), it could also detect the highest lexical scope where the variable is *used* and allocate there. – Shay Rojansky Jan 06 '17 at 21:46
  • @ShayRojansky The compiler does not need a reason to not do something. In fact, the opposite is true: features are unimplemented by default. See also http://stackoverflow.com/a/8673015/1403794 and https://blogs.msdn.microsoft.com/ericlippert/2009/10/05/why-no-extension-properties/ and https://blogs.msdn.microsoft.com/ericlippert/2009/06/22/why-doesnt-c-implement-top-level-methods/ and https://blogs.msdn.microsoft.com/ericlippert/2009/05/18/foreach-vs-foreach/ – Kris Vandermotten Jan 07 '17 at 09:49
  • 2
    This kind of answer isn't very helpful. What I'm asking is (obviously) whether there are good reasons not to implement this optimization, which would reduce heap allocations in some scenarios without an immediately clear downside. It might be that the complexity of this feature doesn't warrant the performance gains, or some other problem that I haven't thought about (which is why I'm asking the community). If the only reason it's not there is that nobody's done it, it may make a good Roslyn issue. – Shay Rojansky Jan 07 '17 at 09:54
  • The immediately clear downside is the cost. All budget spent of this feature is budget that can't be spent on another feature. – Kris Vandermotten Jan 07 '17 at 09:56
  • Once again, an answer that insists on avoiding the question... Obviously every change has a budget cost, that is a well-known fact. I have no idea what Microsoft's budget is, and who knows, an open source developer (maybe even me) may decide to implement something like this, doing it for free. So once again, what I'm asking is whether there are any inherent, technical reasons why this optimization doesn't make sense. – Shay Rojansky Jan 07 '17 at 09:59
  • 1
    Why don't you implement it then, and measure performance gains on real world programs? I believe you'll find that 1) the feature is more complex than you think to analyse, develop and test; 2) the benefits in real world programs are less than you think. One of the reasons why the benefit is so low is that is so easy to do this optimization yourself without significantly decreasing the readability of the code. Lot's of things make it complex. For example, if this wasn't an `if` statement, but a `while` loop (or an if in a loop), the same transformation would probably make performance worse. – Kris Vandermotten Jan 07 '17 at 10:07
  • 1
    Why don't I implement it myself? Because before jumping into a potentially big thing I tend to first ask here on SO to hear the opinion of experts. Which is what this quesion is about. Your first point "the feature is more complex than you think" provides no additional information whatsoever, and I never said I thought this was trivial. – Shay Rojansky Jan 07 '17 at 10:09
  • Your second comment does including a reason. You're right that it's easy for programmers to do this themselves, but it seems to me (no way to prove this) that very few programmers are or would be aware of the problem. In other words, you have to be aware of the unwanted allocation before optimizing it away. The point about the loop is indeed interesting and may be a good reason against this... – Shay Rojansky Jan 07 '17 at 10:13
  • Hey man, I'm only trying to help, but I'm not the compiler team, so I don't have to defend their decisions. That being said, again: there may never have been a decision about this at all, for reasons already given. Secondly, the analysis to determine that this transformation is a good idea is very complex, also for reasons already given by me, Servy and @SLacks. – Kris Vandermotten Jan 07 '17 at 10:14
  • 1
    Your comment regarding the loop is the answer I was looking for, thanks for that... – Shay Rojansky Jan 07 '17 at 10:15