47

I feel I have a pretty decent understanding of closures, how to use them, and when they can be useful. But what I don't understand is how they actually work behind the scenes in memory. Some example code:

public Action Counter()
{
    int count = 0;
    Action counter = () =>
    {
        count++;
    };

    return counter;
}

Normally, if {count} was not captured by the closure, its lifecycle would be scoped to the Counter() method, and after it completes it would go away with the rest of the stack allocation for Counter(). What happens though when it is closured? Does the whole stack allocation for this call of Counter() stick around? Does it copy {count} to the heap? Does it never actually get allocated on the stack, but recognized by the compiler as being closured and therefore always lives on the heap?

For this particular question, I'm primarily interested in how this works in C#, but would not be opposed to comparisons against other languages that support closures.

Matt
  • 41,216
  • 30
  • 109
  • 147
  • Great question. I am not sure, but yes, you can keep the stack frame around in C#. Generators use it all the time (thing LINQ for data structures) which rely on yield under the hood. Hopefully I am not off the mark. if I am, I will learn a great deal. – Hamish Grubijan Dec 18 '09 at 14:53
  • 3
    yield turns the method into a separate class with a state machine. The stack itself isn't kept around, but the stack state is moved into class state in a compiler-generated class – thecoop Dec 18 '09 at 14:55
  • @thecoop, do you have a link explaining this please? – Hamish Grubijan Dec 18 '09 at 15:01
  • Sure, read this series if you want to understand how iterators are built: http://blogs.msdn.com/oldnewthing/archive/2008/08/12/8849519.aspx – Eric Lippert Dec 18 '09 at 15:18
  • 6
    You absolutely CANNOT "keep the stack frame around". The stack frame is on the stack! How would we pop the stack if we were keeping it alive? – Eric Lippert Dec 18 '09 at 15:19
  • Jon Skeet has a section about this in "C# in depth" :) (He even answers questions before they are asked now!?) – cwap Dec 18 '09 at 15:44

4 Answers4

49

Your third guess is correct. The compiler will generate code like this:

private class Locals
{
  public int count;
  public void Anonymous()
  {
    this.count++;
  }
}

public Action Counter()
{
  Locals locals = new Locals();
  locals.count = 0;
  Action counter = new Action(locals.Anonymous);
  return counter;
}

Make sense?

Also, you asked for comparisons. VB and JScript both create closures in pretty much the same way.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • Now that .NET handles `ref struct` better, will closures now use zero-allocation structs rather than classes for the closure when the compiler can prove the closure's lifetime? – Dai Jul 07 '20 at 01:52
  • @Dai: Great question and I do not know the answer. Back when I was at Microsoft -- recall that I left in 2012 -- we had a number of ideas for improving closure lifetimes but I do not know if any of them were implemented. – Eric Lippert Jul 08 '20 at 04:11
35

The compiler (as opposed to the runtime) creates another class/type. The function with your closure and any variables you closed over/hoisted/captured are re-written throughout your code as members of that class. A closure in .Net is implemented as one instance of this hidden class.

That means your count variable is a member of a different class entirely, and the lifetime of that class works like any other clr object; it's not eligible for garbage collection until it's no longer rooted. That means as long as you have a callable reference to the method it's not going anywhere.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • 4
    Inspect the code in question with Reflector to see an example of this – Greg Dec 18 '09 at 14:54
  • 5
    ...just look for the ugliest named class in your solution. –  Dec 18 '09 at 15:03
  • 2
    Does that mean a closure will result in a new heap allocation, even if the value being closured is a primitive? – Matt Nov 11 '10 at 17:14
  • @Matt - I wouldn't call it 'new', because as far as the resulting code is concerned your primitive was always on the stack. The needed closure is created at the same time as whatever object that will use the closure is created. – Joel Coehoorn Nov 11 '10 at 17:31
  • s/always on the stack/always on the heap/ – Joel Coehoorn Nov 11 '10 at 21:37
0

Thanks @HenkHolterman. Since it was already explained by Eric, I added the link just to show what actual class the compiler generates for closure. I would like to add to that the creation of display classes by C# compiler can lead to memory leaks. For example inside a function there a int variable that is captured by a lambda expression and there another local variable that simply holds a reference to a large byte array. Compiler would create one display class instance which will hold the references to both the variables i.e. int and the byte array. But the byte array will not be garbage collected till the lambda is being referenced.

Brikesh Kumar
  • 149
  • 1
  • 12
0

Eric Lippert's answer really hits the point. However it would be nice to build a picture of how stack frames and captures work in general. To do this it helps to look at a slightly more complex example.

Here is the capturing code:

public class Scorekeeper { 
   int swish = 7; 

   public Action Counter(int start)
   {
      int count = 0;
      Action counter = () => { count += start + swish; }
      return counter;
   }
}

And here is what I think the equivalent would be (if we are lucky Eric Lippert will comment on whether this is actually correct or not):

private class Locals
{
  public Locals( Scorekeeper sk, int st)
  { 
      this.scorekeeper = sk;
      this.start = st;
  } 

  private Scorekeeper scorekeeper;
  private int start;

  public int count;

  public void Anonymous()
  {
    this.count += start + scorekeeper.swish;
  }
}

public class Scorekeeper {
    int swish = 7;

    public Action Counter(int start)
    {
      Locals locals = new Locals(this, start);
      locals.count = 0;
      Action counter = new Action(locals.Anonymous);
      return counter;
    }
}

The point is that the local class substitutes for the entire stack frame and is initialized accordingly each time the Counter method is invoked. Typically the stack frame includes a reference to 'this', plus method arguments, plus local variables. (The stack frame is also in effect extended when a control block is entered.)

Consequently we do not have just one object corresponding to the captured context, instead we actually have one object per captured stack frame.

Based on this, we can use the following mental model: stack frames are kept on the heap (instead of on the stack), while the stack itself just contains pointers to the stack frames that are on the heap. Lambda methods contain a pointer to the stack frame. This is done using managed memory, so the frame sticks around on the heap until it is no longer needed.

Obviously the compiler can implement this by only using the heap when the heap object is required to support a lambda closure.

What I like about this model is it provides an integrated picture for 'yield return'. We can think of an iterator method (using yield return) as if it's stack frame were created on the heap and the referencing pointer stored in a local variable in the caller, for use during the iteration.

sjb-sjb
  • 1,112
  • 6
  • 14
  • It is not correct; how can private `swish` be accessed from outside class `Scorekeeper`? What happens if `start` is mutated? But more to the point: what is the value in answering an eight year old question with an accepted answer? – Eric Lippert Oct 25 '17 at 04:35
  • If you want to know what the real codegen is, use ILDASM or an IL-to-source disassembler. – Eric Lippert Oct 25 '17 at 04:37
  • A better way entirely to think of it is to stop thinking of "stack frames" as something fundamental. The stack is simply a data structure that is used to implement two things: **activation** and **continuation**. That is: what are the values associated with the activation of a method, and what code is going to run after this method returns? But the stack is only a suitable data structure for storing activation/continuation information **if method activation lifetimes logically form a stack**. – Eric Lippert Oct 25 '17 at 04:41
  • Since lambdas, iterator blocks and async all enable method activation lifetimes which do not logically form stacks, the stack cannot be used as a data structure for activations and continuations. So the data structures have to be allocated on the long term pool – Eric Lippert Oct 25 '17 at 04:44
  • Your comments on activations and continuations make sense. Activations usually happen from existing frames, though, so there is kind of an implied ordering to the frames. The fact is that earlier frames can terminate while later frames may continue for a longer time, so there are gaps. In addition I suppose we can also have frames generated by asynchronous hardware events. As for why answer an old question, well, I'm getting value out of what you just posted a minute ago :-). I guess I should have commented on your answer rather than start a new one. – sjb-sjb Oct 25 '17 at 04:53