2

I have a scope question regarding Linq expressions that are defined in a loop. The following LinqPad C# Program demonstrates the behaviour:

void Main()
{
    string[] data=new string[] {"A1", "B1", "A2", "B2" };
    string[] keys=new string[] {"A", "B" };

    List<Result> results=new List<Result>();

    foreach (string key in keys) {
        IEnumerable<string> myData=data.Where (x => x.StartsWith(key));     
        results.Add(new Result() { Key=key, Data=myData});          
    }   
    results.Dump();
}

// Define other methods and classes here
class Result {
    public string Key { get; set; }
    public IEnumerable<string> Data { get; set; }
}

Basically, "A" should have data [A1, A2] and "B" should have data [B1, B2].

However, when you run this "A" gets data [B1, B2] as does B. I.e. the last expression is evaluated for all instances of Result.

Given that I declared "myData" inside the loop, why is it behaving as if I declared it outside the loop? EG it is acting like I would expect if I did this:

void Main()
{
    string[] data=new string[] {"A1", "B1", "A2", "B2" };
    string[] keys=new string[] {"A", "B" };

    List<Result> results=new List<Result>();

    IEnumerable<string> myData;                 
    foreach (string key in keys) {
        myData=data.Where (x => x.StartsWith(key));     
        results.Add(new Result() { Key=key, Data=myData});          
    }   
    results.Dump();
}

// Define other methods and classes here
class Result {
    public string Key { get; set; }
    public IEnumerable<string> Data { get; set; }
}

I get the desired result if I force the evaluation inside the iteration, that is not my question.

I'm asking why "myData" is seemingly shared across iterations given that I declared it within scope of a single iteration?

Somebody call Jon Skeet... ;^)

Riko
  • 113
  • 1
  • 8

1 Answers1

5

It's not myData which is being shared - it's key. And as the values within myData are evaluated lazily, they depend on the current value of key.

It's behaving that way because the scope of the iteration variable is the whole loop, not each iteration of the loop. You've got a single key variable whose value changes, and it's the variable which is captured by the lambda expression.

The correct fix is just to copy the iteration variable into a variable within the loop:

foreach (string key in keys) {
    String keyCopy = key;
    IEnumerable<string> myData = data.Where (x => x.StartsWith(keyCopy));     
    results.Add(new Result() { Key = key, Data = myData});
}

For more information about this problem, see Eric Lippert's blog post "Closing over the loop variable considered harmful": part one, part two.

It's an unfortunate artifact of the way the language has been designed, but changing it now would be a bad idea IMO. Although any code which changed behaviour would basically have been broken beforehand, it would mean that correct code in (say) C# 6 would be valid but incorrect code in C# 5, and that's a dangerous position to be in.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • ReSharper will actually complain about this issue and catch it for you. I believe it refers to the problem as attempted use of a modified closure. – Yuck May 27 '11 at 15:12
  • 1
    FYI odds are very good that this will get changed in the next version. We've been in talks with the CLR compatibility committee and we think we can get this change in without too much risk of breaking people. I agree that the risk of making C# 5 code which is incorrect when recompiled in C# 4 is an important one to consider, but I think the benefit of making the fix is pretty large. – Eric Lippert May 30 '11 at 02:15
  • @Eric: Thanks for the heads-up. I agree that the upside would be large. A whole class of SO questions which could be avoided... and while it would be nice to have some way of optionally telling devs that they were relying on this, that would be a slightly strange sort of flag. – Jon Skeet May 30 '11 at 06:53