3

Found a c# head-scratcher. In the foreach loop, using the parent.Id property directly in a Enumerable.Where does not work. Putting it in a variable first works. There is no issue with directly accessing parent.Id in the Select statement.

    List<Person> people = new List<Person>() { 
        new Person() { Id = 1, name = "John", parentId = null },
        new Person() { Id = 2, name = "Sarah", parentId = null },
        new Person() { Id = 3, name = "Daniel", parentId = 1 },
        new Person() { Id = 4, name = "Peter", parentId = 1 }
    };

    List<object> peopleTree = new List<object>();

    var parents = people.Where(p => !p.parentId.HasValue);
    foreach (Person parent in parents)
    {
        int parentId = parent.Id;
        var children = people
            //.Where(p => p.parentId.Equals(parentId)) //This works, is able to find the children
            .Where(p => p.parentId.Equals(parent.Id)) //This does not work, no children for John
            .Select(p => new { Id = p.Id, Name = p.name, pId = parent.Id }); //pId set correctly

        peopleTree.Add(new
        {
            Id = parent.Id,
            Name = parent.name,
            Children = children
        });
    }

Alternatively, if I use a for loop and put parent in a variable first, I can access the parent.Id property directly in the Where statement.

var parents = people.Where(p => !p.parentId.HasValue).ToArray();
for (int idx = 0; idx < parents.Count(); idx++)
{
    var parent = parents[idx];
...

I could not find an answer to why it behaves like this. Can anyone explain it?

Vincejtl
  • 169
  • 1
  • 6
  • In the second example you're calling 'ToArray'. This would probably also make your first example work. – Rufus L May 04 '15 at 16:18
  • Which version of framework are you using? May be I am not getting your question since it works fine for me. – danish May 04 '15 at 16:22
  • @danish The framework version is irrelevant; the compiler version *is* relevant. The behavior was changed in C# 5.0. – Servy May 04 '15 at 16:45

3 Answers3

4

This problem is created by the deferred execution of the children. Essentially, the value of parent at the time when children is evaluated is different. Geekspeak for this is Accessing Modified Closure.

You can fix it by introducing a temporary the way you did, or by forcing evaluation to happen while the foreach loop is still in the current iteration:

var children = people
    .Where(p => p.parentId.Equals(parent.Id))
    .Select(p => new { Id = p.Id, Name = p.name, pId = parent.Id })
    .ToList();
Community
  • 1
  • 1
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • 3
    Or just compile the code using the C# 5.0 compiler, rather than an earlier one. – Servy May 04 '15 at 16:19
  • @Servy That too :) I think Eric Lippert explained the rationale for this change in his blog, but I cannot find the post. – Sergey Kalinichenko May 04 '15 at 16:23
  • 1
    @dasblinkenlight It's here: http://stackoverflow.com/questions/8649337/is-access-to-modified-closure-resolved-by-comprehension-syntax/8649429#8649429 It's also linked from my answer – Joel Coehoorn May 04 '15 at 16:28
2

This is caused by the lazy nature of linq queries. Linq queries will "materialize" as late as possible to avoid doing potentially unnecessary work.

children is a non-materialized IEnumerable<T>. It won't be actually filled with elements. There is a significant difference between parent and parentId, used in your two .Where() calls. parent is only declared once, but parentId is scoped inside the loop, so effectively is declared multiple times. At the time children is eventually materialized parent has changed values. It will refer to the last element in parents, which is not what you intended.

You can force eager evaluation like this.

    var children = people
        .Where(p => p.parentId.Equals(parent.Id)) 
        .Select(p => new { Id = p.Id, Name = p.name, pId = parent.Id })
        .ToArray();  <---- this forces materialization
recursive
  • 83,943
  • 34
  • 151
  • 241
  • 1
    It isn't quite right to say that an IEnumerable itself is **ever** "materialized" with elements. Lists or arrays may be, but you can also have iterator block enumerable that never handle more than one item at a time, and the objects returned by the Select()/Where()/etc extension methods are actually state machines that operate on the original collection... they are never by themselves materialzed, even if you use them in a `foreach`. – Joel Coehoorn May 04 '15 at 16:29
  • @JoelCoehoorn: You're right. I edited my answer to reflect that. – recursive May 04 '15 at 16:44
1

The problem is on the statement that begins like this:

var children = people ...

This statement does not result it in a collection that actually stores values... it results in an IEnumerable object that knows how to iterate over a collection. The instructions used by this object happen to reference the parent variable from the loop. That variable is captured for the Enumerable into something called a closure. Later on, when you actually use the Enumerable object to access items, it looks back at that parent variable.

Here's the trick: there is one parent variable that mutates for each iteration through the original loop. At the end of the loop, all of the items in your parents collection are using the same parent object. Copying the parent.Id value to a variable inside the loop fixes the problem because you're now dealing with a new variable for the closure with each iteration through the loop.

You can also fix this by using a .ToList() call at the end of the statement indicated earlier to evaluate the Enumerable object while still inside the loop. However, I prefer your existing solution, because it's more memory efficient if you never need to expand all of those Children at the same time.

The good news is that this problem is fixed for C# 5.

Community
  • 1
  • 1
Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794