1

I was reading a book which show below example:

for (int i = 0; i < 10; i++)
{
    Task.Factory.StartNew(() => Console.WriteLine(i));
}
Console.ReadLine();

below is the quote from the author:

The obvious intent of the code is to print out all the numbers from 0 to 9. They won’t necessarily be in order, because you are simply issuing work to the thread pool infrastructure, and thus you have no control over the order in which the tasks will run. But if you run the code, you will most likely have seen ten 10s on the screen The cause lies with the closure; the compiler will have had to capture local variable i and place it into a compiler-generated object on the heap, so that it can be referenced inside each of the lambdas. The question is, when does it create this object? As the local variable i is declared outside the loop body, the capture point is, therefore, also outside the loop body. This results in a single object being created to hold the value of i, and this single object is used to store each increment of i. Because each task will share the same closure object, by the time the first task runs the main thread will have completed the loop, and hence i is now 10. Therefore, all 10 tasks that have been created will print out the same value of i, namely 10.

I don't quite understand the part of " the compiler will have had to capture local variable i and place it into a compiler-generated object on the heap", isn't that i is an integer type which is a value type that resides on the stack, how can we create an object on the heap so that this object contains a memery reference to the local i varaible on the stack?

the author provides a fix as:

for (int i = 0; i < 10; i++)
{
    int toCaptureI = i;
    Task.Factory.StartNew(() => Console.WriteLine(toCaptureI));
}
Console.ReadLine();

which I still don't understand, if a new object is create on the heap to hold the reference of toCaptureI, isn't that toCaptureI will be assigned with a new value each iteration in the stack, then isn't it the same the faulty code above?

  • `toCaptureI` is only known inside brackets, which is not the case for `i`. A new reference is created at each loop. – Mad hatter Aug 01 '23 at 15:00
  • In c# the variable `i` can not be accessed outside the loop (which, if I remember correctly, is not true for C++), see [link](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/statements/iteration-statements). – Elec1 Aug 01 '23 at 15:08
  • @Elec1 Indeed you can't access it, but you can't name another `i` variable either, showing its declaration is made outside the loop. – Mad hatter Aug 01 '23 at 15:10
  • @Elec1 The link you provide has an interesting construct : `int i; for(i = 0 [...]` making `i` accessible from outside the loop obviously – Mad hatter Aug 01 '23 at 15:15
  • @Mad hatter You can't compare this construct with one with the loop variable defining in the loop statement. The variant defining the loop variable externally of course makes sense if we need the the loop counter after the for-loop has terminated. – Elec1 Aug 02 '23 at 09:46
  • 1
    It sounds like you believe the falsehood that value types live on the stack. That is simply false, so stop believing it. The fact is: short-lived variables live in the short-lifetime storage pool and long-lived variables live in the long-lifetime variable pool. When you say it that way, it is obviously correct and obviously has nothing to do with whether the variable stores an int or not. – Eric Lippert Aug 17 '23 at 22:25
  • 1
    A closed-over local potentially has a long lifetime, so closed-over local variables are allocated from the long-lifetime pool: the managed heap. Simply stop believing falsehoods and your understanding will improve rapidly. – Eric Lippert Aug 17 '23 at 22:26

4 Answers4

4

To invoke StartNew, it needs to provide a delegate; a delegate is just a function pointer (and optionally an object reference); there is no automatic way of the delegate accessing i, so what the compiler does is rewrite it as something like:

class CaptureState {
    public int i;
    public void TheMethod() {
         Console.WriteLine(i);
    }
}
var magic = new CaptureState();
for (magic.i = 0; magic.i < 10; magic.i++)
{
    Task.Factory.StartNew(magic.TheMethod);
}

Now you can see that there is a single object and that i is shared between all the workers. If we rewrite this as indicated, then it changes since capture scope is handled by declaration point:

class CaptureState {
    public int toCaptureI;
    public void TheMethod() {
         Console.WriteLine(toCaptureI);
    }
}
for (int i = 0; i < 10; i++)
{
    var magic = new CaptureState();
    magic.toCaptureI = i;
    Task.Factory.StartNew(magic.TheMethod);
}

Now you can see that each capture object is independent with the value from the relevant loop cycle.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
0

which I still don't understand, if a new object is create on the heap to hold the reference of toCaptureI, isn't that toCaptureI will be assigned with a new value each iteration in the stack, then isn't it the same the faulty code above?

Compiler will create new instance of special generated type storing the closure for every iteration of the loop to capture the local loop variable, compared to a single instance created for the case when you are using the loop iteration variable - i.

You can play with decompilation @sharplab.io. The first one will result in something like (where <>c__DisplayClass0_ is compiler generated class to store the closure and <<Main>$>b__0 is the method representing your anonymous lambda):

<>c__DisplayClass0_0 <>c__DisplayClass0_ = new <>c__DisplayClass0_0();
<>c__DisplayClass0_.i = 0;
while (<>c__DisplayClass0_.i < 10)
{
    Task.Factory.StartNew(new Action(<>c__DisplayClass0_.<<Main>$>b__0));
    <>c__DisplayClass0_.i++;
}

While the second one will produce something like:

int num = 0;
while (num < 10)
{
    <>c__DisplayClass0_0 <>c__DisplayClass0_ = new <>c__DisplayClass0_0();
    <>c__DisplayClass0_.toCaptureI = num;
    Task.Factory.StartNew(new Action(<>c__DisplayClass0_.<<Main>$>b__0));
    num++;
}
Guru Stron
  • 102,774
  • 10
  • 95
  • 132
0

it is like how async-await works, delegate is just a pointer and StartNew executes after i become 10 but when you use await for StartNew like :

await Task.Factory.StartNew(() => Console.WriteLine(i));

it is work like :(same result)

for (int i = 0; i < 10; i++)
{
    int toCaptureI = i;
    Task.Factory.StartNew(() => Console.WriteLine(toCaptureI));
}
Console.ReadLine();

it just show how delegate and process works

0

"isn't that i is an integer type which is a value type that resides on the stack?"

Unfortunately, this isn't always the case. Additionally, the compiler is doing more behind the scenes than you think its doing due to complication of the closure and delegates (aka Action), which affects how the variable is getting stored.

You can get a better idea of what's really going on by going to https://sharplab.io/ (or any other decompiler, ILDASM, etc.) and pasting the code there. The C# decompile option gives you this:

public class Program
{
    [CompilerGenerated]
    private sealed class <>c__DisplayClass0_0
    {
        public int i;

        internal void <Main>b__0()
        {
            Console.WriteLine(i);
        }
    }

    public static void Main()
    {
        <>c__DisplayClass0_0 <>c__DisplayClass0_ = new <>c__DisplayClass0_0();
        <>c__DisplayClass0_.i = 0;
        while (<>c__DisplayClass0_.i < 10)
        {
            Task.Factory.StartNew(new Action(<>c__DisplayClass0_.<Main>b__0));
            <>c__DisplayClass0_.i++;
        }
        Console.ReadLine();
    }
}

As you can see the "closure" is a compiler generated class <>c__DisplayClass0_0 which holds the variable i. This is that shared variable that everything looks at.

Versus the "fixed" code:

public class Program
{
    [CompilerGenerated]
    private sealed class <>c__DisplayClass0_0
    {
        public int toCaptureI;

        internal void <Main>b__0()
        {
            Console.WriteLine(toCaptureI);
        }
    }

    public static void Main()
    {
        int num = 0;
        while (num < 10)
        {
            <>c__DisplayClass0_0 <>c__DisplayClass0_ = new <>c__DisplayClass0_0();
            <>c__DisplayClass0_.toCaptureI = num;
            Task.Factory.StartNew(new Action(<>c__DisplayClass0_.<Main>b__0));
            num++;
        }
        Console.ReadLine();
    }
}

As you can see, the loop is now using an independent variable num, which is not affected by all the other functions running concurrently.

how can we create an object on the heap so that this object contains a memery reference to the local i varaible on the stack?

See Eric Lippert's explanation here. The thing to take away is you shouldn't be trying to solve the "stack" vs "heap" problem as we don't always have complete control of "how" things get stored. This is more just understanding what's going on behind the scenes with the compiler.