6

I'm doing some tests about yield return perfomance, and I found that it is slower than normal return.

I tested value variables (int, double, etc.) and some references types (string, etc.)... And yield return were slower in both cases. Why use it then?

Check out my example:

public class YieldReturnTeste
{
    private static IEnumerable<string> YieldReturnTest(int limite)
    {
        for (int i = 0; i < limite; i++)
        {
            yield return i.ToString();
        }
    }

    private static IEnumerable<string> NormalReturnTest(int limite)
    {
        List<string> listaInteiros = new List<string>();

        for (int i = 0; i < limite; i++)
        {
            listaInteiros.Add(i.ToString());
        }
        return listaInteiros;
    }

    public static void executaTeste()
    {
        Stopwatch stopWatch = new Stopwatch();

        stopWatch.Start();

        List<string> minhaListaYield = YieldReturnTest(2000000).ToList();

        stopWatch.Stop();

        TimeSpan ts = stopWatch.Elapsed;


        string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}",

        ts.Hours, ts.Minutes, ts.Seconds,

        ts.Milliseconds / 10);

        Console.WriteLine("Yield return: {0}", elapsedTime);

        //****

        stopWatch = new Stopwatch();

        stopWatch.Start();

        List<string> minhaListaNormal = NormalReturnTest(2000000).ToList();

        stopWatch.Stop();

        ts = stopWatch.Elapsed;


        elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}",

        ts.Hours, ts.Minutes, ts.Seconds,

        ts.Milliseconds / 10);

        Console.WriteLine("Normal return: {0}", elapsedTime);
    }
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Marcel James
  • 834
  • 11
  • 20
  • 2
    Watch the memory consumption! The List-method consumes O(n) memory whereas the yield/Enumerator-method has O(1) memory consumption. For very large lists, this is crucial. And you can chain enumerables more easily without additional temporary lists. This is a more general discussion: http://stackoverflow.com/questions/3628425/ienumerable-vs-list-what-to-use-how-do-they-work – Thomas B. Aug 09 '13 at 11:55
  • Two issues: first the `NormalReturnTest` should pre-initialize its list length to `limite`. Second, I'm fairly certain that the `.ToList()` method when operated on a `List` has a special check will actually hit its underlying array and perform an array copy on it rather than iterate the list and copy the items one by one producing a completely different result. Whereas `.ToList()` on your `yield return` enumerable will have to iterate every element and build the array (causing several resizes along the way to hit 2000000 elements). You are measuring the wrong thing. – Chris Sinclair Aug 09 '13 at 11:57
  • Another issue, it's just bad benchmarking in general. For example, you should perform _many_ runs of `ToList`, not a single run. Secondly the time difference (on my machine) is 670ms vs 690ms (and fluctuates greatly) which is far too low to read much from it as there are other unrelated processing issues that can alter its time. – Chris Sinclair Aug 09 '13 at 12:01
  • possible duplicate of [Using yield in C#](http://stackoverflow.com/questions/2696885/using-yield-in-c-sharp) – lesderid Aug 09 '13 at 12:12
  • @ThomasB thats true, the Normal return method occupies more than double the memory than the yield method. I had not thought of that. – Marcel James Aug 09 '13 at 12:22
  • @ChrisSinclair yeah, ive changed it and the Yield return method took like 00:00:00:00 to be executed. The expensive is the conversion .ToList(), as you said. – Marcel James Aug 09 '13 at 12:30
  • 1
    @WilnerAvila: Careful you don't eliminate the iteration entirely. Did you put a `foreach` loop on them instead? Because if you don't iterate it, the `yield return` won't actually _do anything_. – Chris Sinclair Aug 09 '13 at 12:39
  • @ChrisSinclair yes i did it now, the yield needs half time than normal method. Does yield return is executed only when i call an explicit iteration? – Marcel James Aug 09 '13 at 13:46
  • @WilnerAvila Yes; its execution is deferred so the code in it will only execute up to each `yield return` call only when the `MoveNext` method is called during iteration. – Chris Sinclair Aug 09 '13 at 17:30

5 Answers5

15

Consider the difference between File.ReadAllLines and File.ReadLines.

ReadAllLines loads all of the lines into memory and returns a string[]. All well and good if the file is small. If the file is larger than will fit in memory, you'll run out of memory.

ReadLines, on the other hand, uses yield return to return one line at a time. With it, you can read any size file. It doesn't load the whole file into memory.

Say you wanted to find the first line that contains the word "foo", and then exit. Using ReadAllLines, you'd have to read the entire file into memory, even if "foo" occurs on the first line. With ReadLines, you only read one line. Which one would be faster?

That's not the only reason. Consider a program that reads a file and processes each line. Using File.ReadAllLines, you end up with:

string[] lines = File.ReadAllLines(filename);
for (int i = 0; i < lines.Length; ++i)
{
    // process line
}

The time it takes that program to execute is equal to the time it takes to read the file, plus time to process the lines. Imagine that the processing takes so long that you want to speed it up with multiple threads. So you do something like:

lines = File.ReadAllLines(filename);
Parallel.Foreach(...);

But the reading is single-threaded. Your multiple threads can't start until the main thread has loaded the entire file.

With ReadLines, though, you can do something like:

Parallel.Foreach(File.ReadLines(filename), line => { ProcessLine(line); });

That starts up multiple threads immediately, which are processing at the same time that other lines are being read. So the reading time is overlapped with the processing time, meaning that your program will execute faster.

I show my examples using files because it's easier to demonstrate the concepts that way, but the same holds true for in-memory collections. Using yield return will use less memory and is potentially faster, especially when calling methods that only need to look at part of the collection (Enumerable.Any, Enumerable.First, etc.).

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • 1
    Just a question: if the method "ReadLines" doesnt put all file in memory, does it performs a disk access every row that is read? – Marcel James Oct 31 '13 at 15:57
  • 2
    @MichelAlmeida: No. The underlying stream loads the file in blocks of 4 kilobytes or more, and then parses lines from that. So the number of disk reads is minimized. In addition, the operating system usually has some type of read ahead caching enabled, so when the stream requests more information it's already in memory and the "read" is just copying data from one memory location to another. – Jim Mischel Oct 31 '13 at 16:04
  • hm got it. Then it still do more than one disk acess, but with the "caching" thing it is still better than reading all file and put it all in memory, right? Note: Damn i cant quote you oO – Marcel James Oct 31 '13 at 16:34
  • 1
    @MichelAlmeida: Even reading the entire file into memory will do more than one read. It might *look* like one read to you, but under the hood it's reading a block of data, parsing out the lines, reading another block, etc. – Jim Mischel Oct 31 '13 at 17:06
2

For one, it's a convenience feature. Two, it lets you do lazy return, which means that it's only evaluated when the value's fetched. That can be invaluable in stuff like a DB query, or just a collection you don't want to completely iterate over. Three, it can be faster in some scenarios. Four, what was the difference? Probably tiny, so micro optimization.

It'sNotALie.
  • 22,289
  • 12
  • 68
  • 103
1

Because C# compiler converts iterator blocks (yield return) into state machine. State machine is very expensive in this case.

You can read more here: http://csharpindepth.com/articles/chapter6/iteratorblockimplementation.aspx

oakio
  • 1,868
  • 1
  • 14
  • 21
0

I used yield return to give me results from an algorithm. Every result is based on previous result, but I don't need all all of them. I used foreach with yield return to inspect each result and break the foreach loop if I get a result meet my requirement.

The algorithm was decent complex, so I think there was some decent work involved for saving states between each yield returns.

I noticed it was 3%-5% percentage slower than traditional return, but the improvement I get form not needing to generate all results is much much bigger than the loss of performance.

Morio
  • 8,463
  • 5
  • 25
  • 29
0

The .ToList() while necessary to really completing the otherwise deferred iteration of IEnumerable, hinders of measuring the core part.

At least it is important of initializing the list to the known size:

const int listSize=2000000; var tempList = new List(listSize);

...

List tempList = YieldReturnTest(listSize).ToList();

Remark: Both calls took about the same time on my machine.. No difference (Mono 4 on repl.it).

Philm
  • 3,448
  • 1
  • 29
  • 28