169

This question already has an answer here:
Is there ever a reason to not use 'yield return' when returning an IEnumerable?

There are several useful questions here on SO about the benefits of yield return. For example,

I'm looking for thoughts on when NOT to use yield return. For example, if I expect to need to return all items in a collection, it doesn't seem like yield would be useful, right?

What are the cases where use of yield will be limiting, unnecessary, get me into trouble, or otherwise should be avoided?

Community
  • 1
  • 1
Lawrence P. Kelley
  • 4,276
  • 3
  • 28
  • 24
  • 3
    There are a multitude of ways of doing things wrong, it's just an imagination exercise. I would rephrase your question as: What are the common improper usages of yield return? – Jader Dias Oct 19 '10 at 17:28
  • Programmers need to exercise imagination as much as in any other field. – jnm2 Oct 29 '13 at 14:24
  • 4
    This question is marked as a duplicate, yet no link to the duplicated question is provided... Should this be un-duplicated? – einsteinsci Mar 02 '15 at 06:29
  • 3
    This is an important question with interesting and useful answers, it should be reopened. – Colonel Panic May 18 '15 at 09:13

11 Answers11

155

What are the cases where use of yield will be limiting, unnecessary, get me into trouble, or otherwise should be avoided?

It's a good idea to think carefully about your use of "yield return" when dealing with recursively defined structures. For example, I often see this:

public static IEnumerable<T> PreorderTraversal<T>(Tree<T> root)
{
    if (root == null) yield break;
    yield return root.Value;
    foreach(T item in PreorderTraversal(root.Left))
        yield return item;
    foreach(T item in PreorderTraversal(root.Right))
        yield return item;
}

Perfectly sensible-looking code, but it has performance problems. Suppose the tree is h deep. Then there will at most points be O(h) nested iterators built. Calling "MoveNext" on the outer iterator will then make O(h) nested calls to MoveNext. Since it does this O(n) times for a tree with n items, that makes the algorithm O(hn). And since the height of a binary tree is lg n <= h <= n, that means that the algorithm is at best O(n lg n) and at worst O(n^2) in time, and best case O(lg n) and worse case O(n) in stack space. It is O(h) in heap space because each enumerator is allocated on the heap. (On implementations of C# I'm aware of; a conforming implementation might have other stack or heap space characteristics.)

But iterating a tree can be O(n) in time and O(1) in stack space. You can write this instead like:

public static IEnumerable<T> PreorderTraversal<T>(Tree<T> root)
{
    var stack = new Stack<Tree<T>>();
    stack.Push(root);
    while (stack.Count != 0)
    {
        var current = stack.Pop();
        if (current == null) continue;
        yield return current.Value;
        stack.Push(current.Left);
        stack.Push(current.Right);
    }
}

which still uses yield return, but is much smarter about it. Now we are O(n) in time and O(h) in heap space, and O(1) in stack space.

Further reading: see Wes Dyer's article on the subject:

http://blogs.msdn.com/b/wesdyer/archive/2007/03/23/all-about-iterators.aspx

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • 1
    About the first algo: You said it's O(1) in heapspace. Shouldn't it be O(h) in heapspace? (and O(n) in allocated objects over time) – CodesInChaos Oct 19 '10 at 17:14
  • @CodeInChaos: It's O(h) in heap space, yes, that was a typo. (Because the enumerators are allocated on the heap.) – Eric Lippert Oct 19 '10 at 19:01
  • Oh dear (see what i did there) ... my brain fuzzed over on the second time O(n) was typed. I'm just too blond to understand all this `Order` stuff ... #fml – Pure.Krome Oct 19 '10 at 23:36
  • 13
    I keep hoping to hear about a `yield foreach` in the next version of C#... – Gabe Oct 20 '10 at 08:20
  • 1
    Stephen Toub has an article ( http://blogs.msdn.com/b/toub/archive/2004/10/29/249858.aspx ) discussing this specific example, as well as a Towers of Hanoi puzzle solver that uses both methods of iteration in order to demonstrate the performance difference. – Brian Oct 22 '10 at 13:26
  • 1
    @EricLippert I suggest that you add a condition to check null values before pushing to avoid empty leaves traversal `if(current.Right != null) stack.Push(current.Right); if (current.Left != null) stack.Push(current.Left);` but I still don't see how you've optimized it by adding your own stack there. both are still using the yield return which will run the same way. can you explain? – CME64 Feb 14 '15 at 20:28
  • @CME64: Where the null check goes is not particularly relevant. To answer the question of how it is optimized, try it both ways with a deep tree, put a breakpoint on all the yields, and simply count the number of yields that happen, and you will soon see why the form with the stack is more efficient. – Eric Lippert Feb 14 '15 at 20:47
  • @EricLippert the null checks would be important when you talk about optimization, at least to avoid the unnecessary pushes and pops. about that, I wouldn't use break points but rather a counter, which I did before posting the first comment and both methods returned the same result (after adding the null checks) (tested with full BT height = 3, the same will apply for h=x) except that the later returned a higher number of calls because of the extra null nodes (before the null checks). I also tested full and partial traversal and the result was the same. kindly explain it as an algorithm TQ – CME64 Feb 15 '15 at 10:29
  • 1
    @CME64: Instead of a full binary tree, try both the first algorithm I posted and the second with a binary tree with 100 nodes where every right-hand node is null, ie, a maximally unbalanced binary tree. You'll find that in the first algorithm yield return is called *thousands* of times, and in the second, *hundreds*. Do you see why that is? – Eric Lippert Feb 15 '15 at 14:28
  • @EricLippert you're right, the calls to the yield are crazy in the first method, I already noticed that the loop is overlapping but was convinced otherwise after seeing the same result, I was wrong in that deduction and my counter was counting method calls not the return calls. Thank you for clarifying :-) – CME64 Feb 16 '15 at 04:38
  • @EricLippert Shouldn't you be pushing the right node first for a preorder traversal if you're using a stack, so you traverse the left node first when you pop? – Anthony Wieser May 22 '15 at 20:29
  • @WieserSoftwareLtd: This is a good observation that I did not call out here, yes. If you care about the exact order of traversal then you need to carefully make sure you get the pushes and yields in the right places. In many cases though all we want is to know that we saw all the nodes. – Eric Lippert May 22 '15 at 21:03
63

What are the cases where use of yield will be limiting, unnecessary, get me into trouble, or otherwise should be avoided?

I can think of a couple of cases, IE:

  • Avoid using yield return when you return an existing iterator. Example:

    // Don't do this, it creates overhead for no reason
    // (a new state machine needs to be generated)
    public IEnumerable<string> GetKeys() 
    {
        foreach(string key in _someDictionary.Keys)
            yield return key;
    }
    // DO this
    public IEnumerable<string> GetKeys() 
    {
        return _someDictionary.Keys;
    }
    
  • Avoid using yield return when you don't want to defer execution code for the method. Example:

    // Don't do this, the exception won't get thrown until the iterator is
    // iterated, which can be very far away from this method invocation
    public IEnumerable<string> Foo(Bar baz) 
    {
        if (baz == null)
            throw new ArgumentNullException();
         yield ...
    }
    // DO this
    public IEnumerable<string> Foo(Bar baz) 
    {
        if (baz == null)
            throw new ArgumentNullException();
         return new BazIterator(baz);
    }
    
Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
Pop Catalin
  • 61,751
  • 23
  • 87
  • 115
  • 25
    +1 for deferred execution = deferred exception if the code throws. – Davy8 Oct 19 '10 at 16:39
  • 1
    Although you're generally right I don't agree that there is no reason to use `foreach` together with `yield return` - e.g. when you have private collection, returning the collection itself would allow the end user to modify it (with a proper cast), whereas the first method would not. – Grx70 Jul 19 '15 at 09:04
  • 1
    @Grx70 So return your list with `.AsReadOnly()` as an `IReadOnlyCollection`. Problem solved. – ErikE Sep 18 '15 at 22:15
34

The key thing to realize is what yield is useful for, then you can decide which cases do not benefit from it.

In other words, when you do not need a sequence to be lazily evaluated you can skip the use of yield. When would that be? It would be when you do not mind immediately having your entire collection in memory. Otherwise, if you have a huge sequence that would negatively impact memory, you would want to use yield to work on it step by step (i.e., lazily). A profiler might come in handy when comparing both approaches.

Notice how most LINQ statements return an IEnumerable<T>. This allows us to continually string different LINQ operations together without negatively impacting performance at each step (aka deferred execution). The alternative picture would be putting a ToList() call in between each LINQ statement. This would cause each preceding LINQ statement to be immediately executed before performing the next (chained) LINQ statement, thereby forgoing any benefit of lazy evaluation and utilizing the IEnumerable<T> till needed.

Ahmad Mageed
  • 94,561
  • 19
  • 163
  • 174
27

There are a lot of excellent answers here. I would add this one: Don't use yield return for small or empty collections where you already know the values:

IEnumerable<UserRight> GetSuperUserRights() {
    if(SuperUsersAllowed) {
        yield return UserRight.Add;
        yield return UserRight.Edit;
        yield return UserRight.Remove;
    }
}

In these cases the creation of the Enumerator object is more expensive, and more verbose, than just generating a data structure.

IEnumerable<UserRight> GetSuperUserRights() {
    return SuperUsersAllowed
           ? new[] {UserRight.Add, UserRight.Edit, UserRight.Remove}
           : Enumerable.Empty<UserRight>();
}

Update

Here's the results of my benchmark:

Benchmark Results

These results show how long it took (in milliseconds) to perform the operation 1,000,000 times. Smaller numbers are better.

In revisiting this, the performance difference isn't significant enough to worry about, so you should go with whatever is the easiest to read and maintain.

Update 2

I'm pretty sure the above results were achieved with compiler optimization disabled. Running in Release mode with a modern compiler, it appears performance is practically indistinguishable between the two. Go with whatever is most readable to you.

StriplingWarrior
  • 151,543
  • 27
  • 246
  • 315
  • 1
    Is this really slower, though? I would imagine building the array would be just as slow if not slower. – PRMan Sep 29 '15 at 00:40
  • 1
    @PRMan: Yeah, I can see how you might think that. I updated my answer with a benchmark to show the difference. I don't know if my original testing wasn't done properly, or if the .NET framework improved performance since I first answered this, but the performance difference is not nearly as big as I remember it being--certainly not big enough to worry about in most situations. – StriplingWarrior Sep 29 '15 at 18:34
  • 1
    It seems using properties and not constants in the tests yields different results (pun intended). In release mode at least, Invoking and iterating over the yield result based method is faster. – Melvyn Oct 03 '18 at 16:09
  • @Yaurthek: Can you provide a code sample to show what you mean? I'm [seeing similar results](http://share.linqpad.net/ks52eb.linq) as before from returning property accesses: Yield return is much slower when unoptimized and slightly slower in release mode. – StriplingWarrior Oct 03 '18 at 17:14
  • @StriplingWarrior I suspect your implementation is optimized away. [Try this](http://share.linqpad.net/iqleua.linq) in release mode. (I increased the number of iterations because I was not getting stable results otherwise) – Melvyn Oct 05 '18 at 13:04
  • Interesting. I'm still not getting consistent results: yield return looked slightly faster a few times, then arrays looked faster, then they looked to be equal. Cranking it up to 500_000_000 iterations, yield return does appear to be about 4% faster, both for your test and mine. – StriplingWarrior Oct 05 '18 at 15:21
19

Eric Lippert raises a good point (too bad C# doesn't have stream flattening like Cw). I would add that sometimes the enumeration process is expensive for other reasons, and therefore you should use a list if you intend to iterate over the IEnumerable more than once.

For example, LINQ-to-objects is built on "yield return". If you've written a slow LINQ query (e.g. that filters a large list into a small list, or that does sorting and grouping), it may be wise to call ToList() on the result of the query in order to avoid enumerating multiple times (which actually executes the query multiple times).

If you are choosing between "yield return" and List<T> when writing a method, consider: is each single element expensive to compute, and will the caller need to enumerate the results more than once? If you know the answers are yes and yes, you shouldn't use yield return (unless, for example, the List produced is very large and you can't afford the memory it would use. Remember, another benefit of yield is that the result list doesn't have to be entirely in memory at once).

Another reason not to use "yield return" is if interleaving operations is dangerous. For example, if your method looks something like this,

IEnumerable<T> GetMyStuff() {
    foreach (var x in MyCollection)
        if (...)
            yield return (...);
}

this is dangerous if there is a chance that MyCollection will change because of something the caller does:

foreach(T x in GetMyStuff()) {
    if (...)
        MyCollection.Add(...);
        // Oops, now GetMyStuff() will throw an exception
        // because MyCollection was modified.
}

yield return can cause trouble whenever the caller changes something that the yielding function assumes does not change.

Qwertie
  • 16,354
  • 20
  • 105
  • 148
7

I would avoid using yield return if the method has a side effect that you expect on calling the method. This is due to the deferred execution that Pop Catalin mentions.

One side effect could be modifying the system, which could happen in a method like IEnumerable<Foo> SetAllFoosToCompleteAndGetAllFoos(), which breaks the single responsibility principle. That's pretty obvious (now...), but a not so obvious side effect could be setting a cached result or similar as an optimisation.

My rules of thumb (again, now...) are:

  • Only use yield if the object being returned requires a bit of processing
  • No side effects in the method if I need to use yield
  • If have to have side effects (and limiting that to caching etc), don't use yield and make sure the benefits of expanding the iteration outweigh the costs
Community
  • 1
  • 1
Rebecca Scott
  • 2,421
  • 2
  • 25
  • 39
  • 2
    This should be the number one answer for "when not to use". Consider a `RemoveAll` method that returns an `IEnumerable`. If you use `yield return Remove(key)`, then if the caller never iterates, the items will never be removed! – Bruce Pierson Mar 01 '15 at 05:12
  • This is a good main reason that is also easy to keep in mind. You can also consider that potentially thrown exceptions are also side effects. They will also be deferred. This, and the case where you already have an enumerable, like a collection of keys. Then just return the collection already. :) Lazy eval won't give you anything there. – Jonas Jan 09 '19 at 10:42
6

Yield would be limiting/unnecessary when you need random access. If you need to access element 0 then element 99, you've pretty much eliminated the usefulness of lazy evaluation.

Robert Gowland
  • 7,677
  • 6
  • 40
  • 58
  • 2
    When you need random access, IEnumerable can't help you. How would you access element 0 or 99 of an IEnumerable? Guess I don't see what you're trying to say – quentin-starin Oct 19 '10 at 19:05
  • 1
    @qstarin, exactly! The only way to access element 99 is to go through elements 0-98, so lazy evaluation has gained you nothing unless you only needed item 99 out of 2 billion. I'm not saying that you can access `enumberable[99]` I'm saying that if you were only interested in the 99th element, enumerable is not the way to go. – Robert Gowland Oct 19 '10 at 19:34
  • 3
    that has nothing at all to do with yield. It is inherent to IEnumerator, whether it is implemented using iterator blocks or not. – quentin-starin Oct 19 '10 at 20:17
  • 1
    @qstarin, it does have *something* to do with yield since yield will result in an enumerator. The OP asked when to avoid yield, yield results in an enumerator, using an enumerator for random access is unwieldy, therefore using yield when random access is required is a bad idea. The fact that he could have generated an enumerable in a different way doesn't negate the fact that using yield isn't good. You could shoot a man with a gun, or you could hit a man with a bat... the fact that you can kill a man with a bat doesn't negate that you shouldn't have shot him. – Robert Gowland Oct 19 '10 at 20:30
  • @qstarin, however, you are right to point out that there are other ways to generate IEnumerator. – Robert Gowland Oct 19 '10 at 20:32
6

One that might catch you out is if you are serialising the results of an enumeration and sending them over the wire. Because the execution is deferred until the results are needed, you will serialise an empty enumeration and send that back instead of the results you want.

Aidan
  • 4,783
  • 5
  • 34
  • 58
3

I have to maintain a pile of code from a guy who was absolutely obsessed with yield return and IEnumerable. The problem is that a lot of third party APIs we use, as well as a lot of our own code, depend on Lists or Arrays. So I end up having to do:

IEnumerable<foo> myFoos = getSomeFoos();
List<foo> fooList = new List<foo>(myFoos);
thirdPartyApi.DoStuffWithArray(fooList.ToArray());

Not necessarily bad, but kind of annoying to deal with, and on a few occasions it's led to creating duplicate Lists in memory to avoid refactoring everything.

Mike Ruhlin
  • 3,546
  • 2
  • 21
  • 31
2

When you don't want a code block to return an iterator for sequential access to an underlying collection, you dont need yield return. You simply return the collection then.

explorer
  • 11,710
  • 5
  • 32
  • 39
  • 2
    Think about returning it in a read-only wrapper. The caller might cast it back to the original collection type and modify it. – billpg Oct 19 '10 at 19:35
0

If you're defining a Linq-y extension method where you're wrapping actual Linq members, those members will more often than not return an iterator. Yielding through that iterator yourself is unnecessary.

Beyond that, you can't really get into much trouble using yield to define a "streaming" enumerable that is evaluated on a JIT basis.

KeithS
  • 70,210
  • 21
  • 112
  • 164