-7

A recent update has caused significant performance loss in my code. By using a profiler I found that the loss was caused by one single line using Linq. I did some testing and found that Linq was much slower than foreach.

        List<int> numbers = new List<int>();
        for (int i = 0; i < 1000; ++i)
            numbers.Add(i);
        var stopWatch = new Stopwatch();

        {
            int total = 0;
            for (int i = 0; i < 1000; ++i)
                total += i;
        }
        stopWatch.Start();
        for (int j = 0; j < 1000000; ++j)
        {
            int total = 0;
            for (int i = 0; i < 1000; ++i)
                total += i;
        }
        stopWatch.Stop();
        Console.WriteLine("Benchmark run time: {0}", stopWatch.ElapsedMilliseconds);

        {
            int total = 0;
            foreach (int i in numbers)
                total += i;
        }
        stopWatch.Restart();
        for (int j = 0; j < 1000000; ++j)
        {
            int total = 0;
            foreach (int i in numbers)
                total += i;
        }
        stopWatch.Stop();
        Console.WriteLine("foreach run time: {0}", stopWatch.ElapsedMilliseconds);

        {
            int total = 0;
            total += numbers.Sum();
        }
        stopWatch.Restart();
        for (int j = 0; j < 1000000; ++j)
        {
            int total = 0;
            total += numbers.Sum();
        }
        stopWatch.Stop();
        Console.WriteLine("Sum run time: {0}", stopWatch.ElapsedMilliseconds);

Output:

Benchmark run time: 653 foreach run time: 3862 Sum run time: 10233

Does this mean we should always avoid using Linq in performance-critical sections?

Update: Fixed the bug that stopwatch is not reset Do each test once before start the stopwatch for JIT Yes this is in release mode, and is without debugging

Fan
  • 315
  • 3
  • 11
  • You're not even resetting your stopwatch... Also, is this even a release build? I have a hard time believing that it is as I would expect the `total` variable to be optimized out completely. Also, you should let the JIT do its thing (i.e., call each test once) prior to timing. – Ed S. May 15 '14 at 18:46
  • Note that system and .NET version is also highly significant to an answer. See also http://referencesource.microsoft.com/System.Core/System/Linq/Enumerable.cs.html – David May 15 '14 at 18:54
  • It's not a great pick for the duplicate as that one is about delayed execution. – H H May 15 '14 at 19:03
  • Fascinatingly, if you replace `total += numbers.Sum()` with `total = numbers.Sum()`, it shaves something like 2-3 seconds off the execution time, pretty consistently. – Bobson May 15 '14 at 19:07
  • @EdS. Code has been fixed. This is a release build. – Fan May 15 '14 at 19:14
  • @Aravol In your link I see Sum is implemented by foreach. So it is strange that it is that slow. – Fan May 15 '14 at 19:18
  • 1
    I was going to run this under the profiler to see why Sum was taking so much longer, but on my machine Sum took 8.767 seconds while foreach took 8.314 seconds. On the hypothesis that the checked arithmetic is the difference, I modified the console app to use checked arithmetic, and, lo and behold, the figures were 7.293 sec and 7.917. Odd that it got quicker. There's another thing Sum does that your code doesn't, which is the null check on the collection. – phoog May 15 '14 at 19:43
  • 1
    I just thought of another difference. Your code calls the list's public GetEnumerator property, which returns a value type. The Sum method calls the explicit IEnumerable implementation, which returns a reference to a boxed instance of the value-type enumerator. That extra indirection could be the reason for the difference. Indeed, modifying your code to cast to IEnumerable makes the difference. So it's not that Sum is unoptimized, it's that List<>.GetEnumerator() is optimized, but Sum can't take advantage of the optimization. – phoog May 15 '14 at 19:48
  • @phoog I have added checked and null check and those do not make any difference. But after I casted List to IEnumerable, the performance of foreach became as bad as Sum. But I still do not know why on your machine they performed the same. I am using VS2010. Which version did you use? – Fan May 15 '14 at 20:12
  • @Bobson I tested total = numbers.Sum() and it did not reduce the run time for me. – Fan May 15 '14 at 20:15
  • @phoog I have done more testing. In my original code I was using IEnumerable, instead of List. The performance loss was actually caused by a Where before Sum. Where + Sum will further increase the run time to 23000+. So now at least I have know two things: 1) do not use Linq in performance-critical sections; and 2) do not use IEnumerable either. – Fan May 15 '14 at 20:28
  • 1
    Good catch. I am using VS 2013. I recall seeing in the reference sources that some of the linq operators that were originally written as iterator blocks (that is, with yield return) have been rewritten; presumably that was for performance, and would explain our different results. – phoog May 16 '14 at 03:42
  • @phoog That is good news. Once we update to VS 2013 I will reprofile the code. I think you have the best answer but unfortunately I am not able to mark a comment as the best answer. – Fan May 16 '14 at 14:42
  • Here's the link: http://referencesource.microsoft.com/System.Core/System/Linq/Enumerable.cs.html. They replaced the iterator blocks (which are still present, commented out, beginning at line 391) with hand-written classes. The Where overload that provides an index to the predicate is still implemented as an iterator block, interestingly. – phoog May 16 '14 at 16:54
  • @phoog Just tested the code on VS2013. In framework 4.5, run time for Sum decreased by 2 sec, but still much longer. However, in framework 4.5.1, run time for foreach increased by 1 sec. – Fan May 16 '14 at 19:53
  • @phoog I looked IL code and it seems the question is answered here: [link](http://stackoverflow.com/questions/10634679/enumerating-via-interface-performance-loss). one Enumerator is valuetype and the other is class – Fan May 16 '14 at 21:01
  • @user2316040 I don't think that's the only difference, though; I modified the test to control for that difference and still saw some differences in the execution times. – phoog May 16 '14 at 22:25

3 Answers3

7

You forgot to reset the Stopwatch after foreach

Selman Genç
  • 100,147
  • 13
  • 119
  • 184
7
  1. Call stopWatch.Reset() before starting it again. Otherwise your perf test is worthless - you're summing up all the results.

  2. Do you run tests on Release build, outside of Visual Studio?

  3. Yes, LINQ is not as fast as for loop. But it's more readable, faster to write.

MarcinJuraszek
  • 124,003
  • 15
  • 196
  • 263
3

Your test was likely flawed, according to MSDN you need to reset your stopwatch after each use.

That being said, LINQ can be slower, but whether or not it matters is a different story:

  • Microbenchmarks are dangerous, because unless you are actually performing the operation that many times, the performance difference will be negligible
  • Some LINQ queries exit early, so writing an equivalent foreach isn't always as easy as it seems.

In short, if you run a profiler and see a real bottleneck, consider switching (run the profiler again to make sure you fixed it!) It really is a case-by-case basis though.

BradleyDotNET
  • 60,462
  • 10
  • 96
  • 117
  • By changing Sum back to foreach I did see the performance came back. It is just frustrating to know that Linq is not optimized as well as I thought. – Fan May 15 '14 at 19:21