4

I recently came across some code that does not behave how I would have expected.

1: int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8 };
2: IEnumerable<int> result = numbers.Select(n => n % 2 == 0 ? n : 0);
3: 
4: int a = result.ElementAt(0);
5: numbers[0] = 10;
6: int b = result.ElementAt(0);

When I stepped through this code with Visual Studio, I was surprised to see that the yellow highlighting jumped from line 4 back to the lambda expression on line 2, then again from line 6 to the lambda on line 2.

Moreover, the value of a after running this code is 0 and the value of b is 10.

The original code that made me realize that this could/would happen involved a method call within the Select(), and accessing any property or specific element of the IEnumerable resulted in the method within Select() being called again and again.

// The following code prints out:
// Doing something... 1
// Doing something... 5
// Doing something... 1
// Doing something... 2
// Doing something... 3
// Doing something... 4
// Doing something... 5

using System;
using System.Linq;
using System.Collections.Generic;

class Program
{
    static void Main(string[] args)
    {
        int[] numbers = { 1, 2, 3, 4, 5 };
        IEnumerable<int> result = numbers.Select(DoSomething);

        int a = result.ElementAt(0);
        int b = result.ElementAt(4);
        int c = result.Count();
    }

    static int DoSomething(int x)
    {
        Console.WriteLine("Doing something... " + x);
        return x;
    }
}

I feel like I now understand how the code will behave (and I've found other questions online that are the result of this behavior). However, what exactly causes the code within the Select() to be called from later lines?

elmer007
  • 1,412
  • 14
  • 27
  • I feel like the question marked as a duplicate is very broad, and I've edited this one to try to be clear that I'm looking for something much more specific or behind-the-scenes – elmer007 Oct 15 '18 at 21:53
  • 1
    **Note**: I've reopened this since the duplicate is really too broad and doesn't answer the question in a manner that can be easily translated to this code – Camilo Terevinto Oct 15 '18 at 22:15
  • Note: Change `numbers.Select(DoSomething)` to `numbers.Select(DoSomething).ToList()` to buffer the results so it won't re-evaluate the Linq expression again. – Dai Oct 15 '18 at 22:22
  • yes, linq does deferred execution. Its a clever feature that can have surprising side effects. Look at the source code for select https://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,0e5ab1a57b7e1438 – pm100 Oct 15 '18 at 22:26

3 Answers3

3

You have a reference to a LINQ query, which are evaluated as many times as you iterate over them.

From the docs (you can see this is called Deferred Execution):

As stated previously, the query variable itself only stores the query commands. The actual execution of the query is deferred until you iterate over the query variable in a foreach statement. This concept is referred to as deferred execution

...

Because the query variable itself never holds the query results, you can execute it as often as you like. For example, you may have a database that is being updated continually by a separate application. In your application, you could create one query that retrieves the latest data, and you could execute it repeatedly at some interval to retrieve different results every time.

So, when you have

IEnumerable<int> result = numbers.Select(DoSomething);

You have a reference to a query that will transform each element in numbers to the result of DoSomething.
So, you could say that the following:

int a = result.ElementAt(0);

iterates result up until the first element. The same happens for ElementAt(4), but this times it iterates until the fifth element. Notice that you only see printed Doing something... 5 because .Current is evaluated once. The call would fail if the query, at that moment, couldn't produce 5 items.
The .Count call, again iterates the result query and returns the amount of elements at that moment.

If instead of keeping the reference to the query, you kept a reference to the results, i.e:

IEnumerable<int> result = numbers.Select(DoSomething).ToArray();
// or
IEnumerable<int> result = numbers.Select(DoSomething).ToList();

You would only see this output:

// Doing something... 1
// Doing something... 2
// Doing something... 3
// Doing something... 4
// Doing something... 5
Community
  • 1
  • 1
Camilo Terevinto
  • 31,141
  • 6
  • 88
  • 120
  • This is very interesting. When you say that "The same happens for ElementAt(4), but this time it iterates until the fifth element", does that suggest that the `Element(4)` call should print out all the numbers? It only prints the number 5 – elmer007 Oct 15 '18 at 22:45
  • 1
    @elmer007 No, sorry, that was a mental bug. It should only print 5. The thing is that `Current` is only evaluated once – Camilo Terevinto Oct 15 '18 at 23:04
1

Let's break this down piece by piece until you understand it. Trust me; take your time and read this and it will be a revelation to you understanding Enumerable types and answer your question.

Look at the IEnumerable interface which is the base of IEnumerable<T>. It contains one method; IEnumerator GetEnumerator();.

Enumerables are a tricky beast because they can do whatever they want. All that really matters is the call to the GetEnumerator() that happens automatically in a foreach loop; or you can do it manually.

What does GetEnumerator() do? It returns another interface, IEnumerator.

This is the magic. The IEnumerator has 1 property and 2 methods.

object Current { get; }
bool MoveNext();
void Reset();

Let's break down the magic.

First let me explain what they are typically, and I say typically because like I mentioned it can be a tricky beast. You're allowed to implement this however you choose... Some types don't follow the standards.

object Current { get; } is obvious. It gets the current object in the IEnumerator; by default this might be null.

bool MoveNext(); This returns true if there is another object in the IEnumerator and it should set the Current value to that new object.

void Reset(); tells the type to start over from the beginning.

Now lets implement this. Please take the time to review this IEnumerator type so that you understand it. Realize that when you reference an IEnumerable type you are not even referencing the IEnumerator (this); however, you're referencing a type that returns this IEnumerator via GetEnumerator()

Note: Be careful not to confuse the names. IEnumerator is different than IEnumerable.

IEnumerator

public class MyEnumerator : IEnumerator
{
    private string First => nameof(First);
    private string Second => nameof(Second);
    private string Third => nameof(Third);
    private int counter = 0;

    public object Current { get; private set; }

    public bool MoveNext()
    {
        if (counter > 2) return false;

        counter++;
        switch (counter)
        {
            case 1:
                Current = First;
                break;
            case 2:
                Current = Second;
                break;
            case 3:
                Current = Third;
                break;                    
        }
        return true;
    }

    public void Reset()
    {
        counter = 0;
    }
}

Now, let's make an IEnumerable type and use this IEnumerator.

IEnumerable

public class MyEnumerable : IEnumerable
{
    public IEnumerator GetEnumerator() => new MyEnumerator();
}

This is something to soak in... When you make a call like numbers.Select(n => n % 2 == 0 ? n : 0) you aren't iterating any items... you're returning a type much like the one above. .Select(…) returns IEnumerable<int>. Well looky above... IEnumerable isn't anything but an interface that calls GetEnumerator(). That happens whenever you enter a looping situation or it can be done manually. So, with that in mind you can already see the iteration never starts until you call GetEnumerator() and even then it never starts until you call the MoveNext() method of the result of GetEnumerator() which is the IEnumerator type.

So...

In other words, you just have a reference to an IEnumerable<T> in your call and nothing more. No iterations have taken place. This is why the code jumps back up in yours because it finally does iterate in the ElementAt method and it's then looking at the lamba expression. Stay with me and I'll later update an example to take this lesson full circle but for now let's continue our simple example:

Let's now make a simple console app to test our new types.

Console App

class Program
{
    static void Main(string[] args)
    {
        var myEnumerable = new MyEnumerable();

        foreach (var item in myEnumerable)
            Console.WriteLine(item);

        Console.ReadKey();
    }

    // OUTPUT
    // First
    // Second
    // Third
}

Now let's do the same thing but make it generic. I won't write as much but monitor the code closely for changes and you'll get it.

I'm going to copy and paste it all in one.

Entire Console App

using System;
using System.Collections;
using System.Collections.Generic;

namespace Question_Answer_Console_App
{
    class Program
    {
        static void Main(string[] args)
        {
            var myEnumerable = new MyEnumerable<Person>();

            foreach (var person in myEnumerable)
                Console.WriteLine(person.Name);

            Console.ReadKey();
        }

        // OUTPUT
        // Test 0
        // Test 1
        // Test 2
    }

    public class Person
    {
        static int personCounter = 0;
        public string Name { get; } = "Test " + personCounter++;
    }

    public class MyEnumerator<T> : IEnumerator<T>
    {
        private T First { get; set; }
        private T Second { get; set; }
        private T Third { get; set; }
        private int counter = 0;

        object IEnumerator.Current => (IEnumerator<T>)Current;
        public T Current { get; private set; }

        public bool MoveNext()
        {
            if (counter > 2) return false;

            counter++;
            switch (counter)
            {
                case 1:
                    First = Activator.CreateInstance<T>();
                    Current = First;
                    break;
                case 2:
                    Second = Activator.CreateInstance<T>();
                    Current = Second;
                    break;
                case 3:
                    Third = Activator.CreateInstance<T>();
                    Current = Third;
                    break;
            }
            return true;
        }

        public void Reset()
        {
            counter = 0;
            First = default;
            Second = default;
            Third = default;
        }

        public void Dispose() => Reset();
    }

    public class MyEnumerable<T> : IEnumerable<T>
    {
        IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
        public IEnumerator<T> GetEnumerator() => new MyEnumerator<T>();
    }
}

So let's recap... IEnumerable<T> is a type that has a method that returns an IEnumerator<T> type. The IEnumerator<T> type has the T Current { get; } property as well as the IEnumerator methods.

Let's break this down one more time in code and call out the pieces manually so that you can see it clearer. This will be only the console part of the app because everything else stays the same.

Console App

class Program
{
    static void Main(string[] args)
    {
        IEnumerable<Person> enumerable = new MyEnumerable<Person>();
        IEnumerator<Person> enumerator = enumerable.GetEnumerator();

        while (enumerator.MoveNext())
            Console.WriteLine(enumerator.Current.Name);

        Console.ReadKey();
    }
    // OUTPUT
    // Test 0
    // Test 1
    // Test 2
}

FYI: One thing to point out is in the answer above there are two versions of Linq. Linq in EF or Linq-to-SQL contain different extension methods than typical linq. The main difference is that query expression in Linq (when referring to a database) will return IQueryable<T> which implements the IQueryable interface, which creates SQL expressions that are ran and iterated against. In other words... something like a .Where(…) clause doesn't query the entire database and then iterate over it. It turns that expression into a SQL expression. That's why things like .Equals() will not work in those specific Lambda expressions.

Michael Puckett II
  • 6,586
  • 5
  • 26
  • 46
  • This is nice, but I think you need to include something about deferred execution. – ldam Oct 16 '18 at 13:17
  • 1
    @Logan I thought I explained exactly what deferred execution is behind the scenes. It's nothing other than holding the type reference and never using it until needed. There's another long lesson to tie up deferred execution in fullness and when you start using Linq with Data (EF, Linq-to-SQL, ETC) then it's a bit different. I did include an FYI about that. I agree with your concern but if you break what I've written down, this is the base of all of it, and the logic should stick. I'll update it with a tidbit about Deferred Execution but understanding it is better than labeling it IMO. – Michael Puckett II Oct 16 '18 at 17:50
  • 1
    Right, I worded that wrong. You explained deferred execution without saying deferred execution :) I agree that it's better to understand than to label it, but the label is important too so people know what you're talking about. – ldam Oct 17 '18 at 08:13
  • @Logan Can't argue that. – Michael Puckett II Oct 17 '18 at 19:07
0

Does IEnumerable<T> store a function to be called later?

Yes. An IEnumerable is exactly what it says it is. It's something which can be enumerated through at some future point. You can think of it like setting up a pipeline of operations.

It's not until it actually is enumerated (I.E. calling foreach, .ElementAt(), ToList(), etc) that any of those operations are actually invoked. This is called deferred execution.

what exactly causes the code within the Select() to be called from later lines?

When you call SomeEnumerable.Select(SomeOperation), the result is an IEnumerable which is an object representing that "pipeline" which you have set up. The implementation of that IEnumerable does store the function which you passed to it. The actual source for this (for .net core) is here. You can see that SelectEnumerableIterator, SelectListIterator, and SelectArrayIterator all have a Func<TSource, TResult> as a private field. This is where it stores that the function you specified for later use. The array and list iterators just provide some shortcuts if you know you're iterating through a finite collection.

Eric Damtoft
  • 1,353
  • 7
  • 13