22

Yield is something I find tough to understand till now. But now I am getting a hold of it. Now, in a project, if I return List, Microsoft code analysis will give a warning about it. So, normally I'll do all the necessary logical parts and return the list as IEnumerable. I want to know the difference between the two. Means if I am doing yield return or otherwise.

Here is a very simple example I am showing, normally the code is a little bit complicated.

private static IEnumerable<int> getIntFromList(List<int> inputList)
{
    var outputlist = new List<int>();
    foreach (var i in inputList)
    {
        if (i %2 ==0)
        {
            outputlist.Add(i);
        }
    }

    return outputlist.AsEnumerable();
}

private static IEnumerable<int> getIntFromYeild(List<int> inputList)
{
    foreach (var i in inputList)
    {
        if (i%2 == 0)
        {
            yield return i;
        }
    }
}

One significant benefit I can see is fewer lines. But is there any other benefit? Should I change and update my functions which are returning IEnumearble to use yield instead of List? What is the best way or a better way to do things?

Here, I can use simple lambda expressions over List, but normally that is not the case, this example is specifically to understand best approach of coding.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
kunjee
  • 2,739
  • 1
  • 23
  • 38
  • Related question: http://stackoverflow.com/questions/410026/proper-use-of-yield-return – Leri Jan 21 '13 at 07:29

5 Answers5

47

Your first example is still doing all the work eagerly and building up a list in memory. In fact, the call to AsEnumerable() is pointless - you might as well use:

return outputlist;

Your second example is lazy - it only does as much work as it needs to as the client pulls data from it.

The simplest way to show the difference is probably to put a Console.WriteLine call inside the if (i % 2 == 0) statement:

Console.WriteLine("Got a value to return: " + i);

Then if you also put a Console.WriteLine call in the client code, e.g.

foreach (int value in getIntFromList(list))
{
    Console.WriteLine("Received value: " + value);
}

... you'll see that with your first code, you see all the "Got a value" lines first, then all the "Received value" lines. With the iterator block, you'll see them interleaved.

Now imagine that your code is actually doing something expensive, and your list is very long, and the client only wants the first 3 values... with your first code, you'd be doing a load of irrelevant work. With the lazy approach, you only do as much work as you need to, in a "just in time" fashion. The second approach also doesn't need to buffer all the results up in memory - again, if the input list is very large, you'd end up with a large output list too, even if you only wanted to use a single value at a time.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 15
    Your ability to explain things to others via text is something I hope to achieve one day. – Simon Whitehead Jan 21 '13 at 07:44
  • @SimonWhitehead: Thanks - although I still have a long way to go. Practising regularly is the key :) – Jon Skeet Jan 21 '13 at 07:45
  • @JonSkeet Thanks for drametically quick reply, by the way you are the reason I understand yield now... (Tekpub series), basically have created something like rob conery's massive but for excel file. Now, internally it read cell by cell, so after little bit of understanding of yield I thought I was doing wrong, as I am doing as first way. What your thoughts over it? Means reading massive excel and creating collection of dynamicobject and I am doing with first approach... – kunjee Jan 21 '13 at 07:53
  • @kunjee: Well, it may well not be "wrong" - but it may be less efficient than it could be. It depends. The advantage of doing all the work up-front is that you can then close the Excel file, of course... but it does mean you've read the whole thing into memory. I think we'd need a lot more detail to be able to give more concrete advice, but hopefully this at least helps a bit :) – Jon Skeet Jan 21 '13 at 07:55
17

The key point about yield return is that it is not buffered; the iterator block is a state machine, that resumes as the data is iterated. This makes it handy for very large data-sources (or even infinite lists), since you can avoid having a huge in-memory list.

The following is a perfectly well defined iterator-block, that can be iterated successfully:

Random rand = new Random();
while(true) yield return rand.Next();

and we can do things like:

for(int i in TheAbove().Take(20))
    Console.WriteLine(i);

Although obviously, anything that iterates to the end (such as Count() etc) will run forever without ending - not a great idea.

In your example, the code is probably over-complicated. The List<int> version could be just:

return new List<int>(inputList);

The yield return kinda depends on what you want to do: at the simplest, it could be just:

foreach(var item in inputList) yield return item;

although obviously that will still be looking at the source data: changes to inputList could break the iterator. If you think "that's fine", then frankly you might as well just:

return inputList;

If that isn't fine, in this case the iterator block is a bit overkill, and the:

return new List<int>(inputList);

should suffice.

For completeness: AsEnumerable just returns the original source, type cast; it is the:

return inputList;

version. This has an important consideration, in that it doesn't protect your lists, if that is a concern. So if you are thinking:

return someList.AsEnumerable(); // so they can only iterate it, not Add

then that will not work; an evil caller can still just do:

var list = (IList<int>) theAbove;
int mwahaahahaha = 42;
list.Add(mwahaahahaha);
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
1

Big difference: The second (yield) creates less memory garbage. The first basically create a copy of the list in memory.

Big difference: if the caller manipulates the original list in sample 2 it will break, in sample 1 it will not (due to iterating a copy).

So, the two codes are NOT identical, they only are so when you do not think about edge cases and only look at the straight case, as well as ignore all side effects.

As a result, btw., example 2 is faster due to not allocating a second list.

TomTom
  • 61,059
  • 10
  • 88
  • 148
0

The difference is in the time of the execution.

In your first example, the code in your function is executed before the function exits. The whole list is created and then returned as IEnumerable.

In the second example, the code in the function doesn'y actually run when the function exits. Instead, when the function exits it returns an IEnumerable and when you iterate through that IEnumerable later on, that's when the code executes.

In particular if you only iterate through the first 3 elements of the IEnumerable in the second example the for loop will only iterate enough times to get three elements and not more.

Petar Ivanov
  • 91,536
  • 11
  • 82
  • 95
0

When you use yield, compiler generates the code of iterator pattern which would work faster then pre-generated list. It is something like this:

namespace Yield
{
    class UserCollection
    {
        public static IEnumerable Power()
        {
            return new ClassPower(-2);
        }

        private sealed class ClassPower : IEnumerable<object>, IEnumerable, IEnumerator<object>, IEnumerator, IDisposable
        {

            private int state;
            private object current;
            private int initialThreadId;

        public ClassPower(int state)
        {
            this.state = state;
            this.initialThreadId = Thread.CurrentThread.ManagedThreadId;
        }

        bool IEnumerator.MoveNext()
        {
            switch (this.state)
            {
                case 0:
                    this.state = -1;
                    this.current = "Hello world!";
                    this.state = 1;
                    return true;

                case 1:
                    this.state = -1;
                    break;
            }
            return false;
        }

        IEnumerator<object> IEnumerable<object>.GetEnumerator()
        {
            if ((Thread.CurrentThread.ManagedThreadId == this.initialThreadId) && (this.state == -2))
            {
                this.state = 0;
                return this;
            }
            return new UserCollection.ClassPower(0);
        }

        IEnumerator IEnumerable.GetEnumerator()
        {       
            return (this as IEnumerable<object>).GetEnumerator();
        }

        void IEnumerator.Reset()
        {
            throw new NotSupportedException();
        }

        void IDisposable.Dispose()
        {
        }

        object IEnumerator<object>.Current
        {
            get
            {
                return this.current;
            }
        }

        object IEnumerator.Current
        {
            get
            {
                return this.current;
            }
        }
    }
}

}

Alex
  • 8,827
  • 3
  • 42
  • 58