3

I am trying to understand why the following code behaves as it does:

using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApplication
{
    internal class Program
    {
        private static IEnumerable<int> Foo()
        {
            Console.WriteLine("Hello world!");
            for (var i = 0; i < 10; i++)
                yield return i;
        }

        public static void Main(string[] args)
        {
            var x = Foo();

            var y = x.Take(3);
            foreach (var i in y)
                Console.WriteLine(i);

            var z = x.Skip(3);
            foreach (var i in z)
                Console.WriteLine(i);
        }
    }
}

In main I get a new "generator" (excuse the python terminology) called x, and then I try to enumerate it twice. The first time I take the first 3 elements and print them: I expect the program to print "Hello world!" followed by the numbers from 0 to 2. The second time I take the same enumerable and skip 3 more elements. I expect the program to print the numbers from 6 to 9, without printing "Hello world!" first.

Instead, the second time the program prints "Hello world!" a second time and then starts enumerating from 3, up to 9.

I don't understand: why is Foo() called twice?

EDIT1: Not a duplicate. The linked "duplicate" asks about best practices when writing an enumerable, here I have trouble consuming its values.

EDIT2: I accepted an answer that honestly was not great, but contained a link to a reference that made me understand the problem. What was confusing to me was how the foreach loop actually works when looping over an IEnumerable, so I will explain it here for future reference.

The IEnumerable is the sequence of values - the underlying implementation is irrelevant, just as long as it is something that can give you values in order. When you use a foreach loop on it, it creates an Enumerator object which is basically a cursor that keeps track of where the loop has arrived.

So what is happening is that the first loop creates a first enumerator, which in turn must start my generator method and that prints the first string.

The second loop cannot pick up the previous enumerator (as I believed it did). Instead, it instantiates a second Enumerator, which in turn re-starts the generator and thus prints the second string.

To any pythonista reading this, be warned that it's the polar opposite of how generators work in python (where reusing an iterable does NOT restart it).

Michele Ippolito
  • 686
  • 1
  • 5
  • 12
  • Possible duplicate of [Is there ever a reason to not use 'yield return' when returning an IEnumerable?](https://stackoverflow.com/questions/3856625/is-there-ever-a-reason-to-not-use-yield-return-when-returning-an-ienumerable) – Fabiano Jun 30 '17 at 13:17
  • I disagree about it being a duplicate. That question asks about best practices when writing an iterable; I am trying to understand why my generator is being reset even though I don't expect it to. – Michele Ippolito Jun 30 '17 at 13:22
  • _x_ doesn't save a state between subsequent calls, you always get the full enumeration generated by _Foo()_ (including side effects). – Helmut D Jun 30 '17 at 13:31
  • @HelmutD, thanks, that ended up being the reason. I come from python, where it is the generator itself that maintains its state. – Michele Ippolito Jun 30 '17 at 13:40

2 Answers2

2

In C#, when you use the yield keyword in a method or property that returns IEnumerable or IEnumerable<T> the method (or property) becomes a lazy sequence, it does not just return a list or array-like structure, it runs the method until the first yield statement, then pauses until the next element is "pulled" from the calling code (your Main method).

So when you write x = Foo() you are setting x equal to a lazy sequence that executes the body of Foo every time you loop over it. Any side effects in the body of Foo will be executed once per iteration of this state machine.

Skip and Take each take a lazy sequence as input, and then yield return elements as needed, resulting in a new lazy sequence. So when you write y = x.Take(3), nothing is iterated there, but when you foreach y, it will effectively execute Foo until the third yield statement.

Skip source code: https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Skip.cs

Likewise z = x.Skip(3) makes z a third lazy sequence, created by composing x with the Skip operation. Nothing is iterated here immediately, but when you foreach z, it will execute the full body of Foo, but will "throw out" the first 3 elements.

See Jon Skeet's book: http://csharpindepth.com/Articles/Chapter6/IteratorBlockImplementation.aspx

JamesFaix
  • 8,050
  • 9
  • 37
  • 73
  • That is perfectly clear to me... but my state machine is *not* being disposed. I still have a valid reference to it. I am not doing `z = Foo().Skip(3)`, I am doing `z = x.Skip(3)` where x has already been iterated over. – Michele Ippolito Jun 30 '17 at 13:23
  • `Foo` returns the iterator state machine, so `x` becomes that machine, not the values. The iterator executes the body of `Foo` each time. – JamesFaix Jun 30 '17 at 13:25
  • Or rather, `Foo` returns the `IEnumerable` that builds that state machine when `GetEnumerator` is called. `IEnumerator` is the state machine. – JamesFaix Jun 30 '17 at 13:26
  • 1
    @JamesFaix the explanation is wrong, it's the `foreach` that invokes the iteration, not `Skip` or `Take` – Oliver Jun 30 '17 at 13:26
  • 1
    How do you think `Skip` and `Take` are implemented? https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Skip.cs – JamesFaix Jun 30 '17 at 13:27
  • 1
    With deferred execution - remove the foreaches and Foo won't be invoked at all – Oliver Jun 30 '17 at 13:27
  • `Skip` and `Take` operate on one lazy sequence (the `Foo()` result) and the `yield return` values as needed, creating a new lazy sequence. So when you `foreach` and iterate the result of those methods, you are iterating a chain/pipe of state machines basically. – JamesFaix Jun 30 '17 at 13:29
  • 2
    Exactly - therefore "When you write y = x.Take(3) the body of Foo is iterated up to 3 elements and then the state machine is disposed of" is incorrect – Oliver Jun 30 '17 at 13:30
0

Because it is `IEnumerable'. This means that enumerator (your method) is called each time you call it. If you do not want it to be called twice than you should materialize it:

var x = Foo().ToArray();

Pablo notPicasso
  • 3,031
  • 3
  • 17
  • 22