7

Preamble: I was changing a bit of code (manual Max search in an array) to some Linq.Max() super sexy written line, which brings me to questions about performance (I often deal with big arrays). So I made a little program to test, because I only trust what I see and got this results:

The size is now of 1 elements
With the loop it took:  00:00:00.0000015 
With Linq it took:      00:00:00.0000288 
The loop is faster: 94,79%
-----------------------------------------
The size is now of 10 elements
With the loop it took:  00:00:00 
With Linq it took:      00:00:00.0000007 
The loop is faster: 100,00%
-----------------------------------------
The size is now of 100 elements
With the loop it took:  00:00:00 
With Linq it took:      00:00:00.0000011 
The loop is faster: 100,00%
-----------------------------------------
The size is now of 1 000 elements
With the loop it took:  00:00:00.0000003 
With Linq it took:      00:00:00.0000078 
The loop is faster: 96,15%
-----------------------------------------
The size is now of 10 000 elements
With the loop it took:  00:00:00.0000063 
With Linq it took:      00:00:00.0000765 
The loop is faster: 91,76%
-----------------------------------------
The size is now of 100 000 elements
With the loop it took:  00:00:00.0000714 
With Linq it took:      00:00:00.0007602 
The loop is faster: 90,61%
-----------------------------------------
The size is now of 1 000 000 elements
With the loop it took:  00:00:00.0007669 
With Linq it took:      00:00:00.0081737 
The loop is faster: 90,62%
-----------------------------------------
The size is now of 10 000 000 elements
With the loop it took:  00:00:00.0070811 
With Linq it took:      00:00:00.0754348 
The loop is faster: 90,61%
-----------------------------------------
The size is now of 100 000 000 elements
With the loop it took:  00:00:00.0788133 
With Linq it took:      00:00:00.7758791 
The loop is faster: 89,84%

In short, Linq is almost 10 times slower, which bothered me a bit, so I took a look to the implementation of Max():

public static int Max(this IEnumerable<int> source) {
    if (source == null) throw Error.ArgumentNull("source");
    int value = 0;
    bool hasValue = false;
    foreach (int x in source) {
        if (hasValue) {
            if (x > value) value = x;
        }
        else {
            value = x;
            hasValue = true;
        }
    }
    if (hasValue) return value;
    throw Error.NoElements();
}

As the title already ask, what in this implementation make it 10 times slower? (And it's not about the ForEach, I've checked)

Edit:

Of course, I test in Release mode.

Here's my test code (without output):

//----------------
private int[] arrDoubles;
//----------------

Stopwatch watch = new Stopwatch();
//Stop a 100 Millions to avoid memory overflow on my laptop
for (int i = 1; i <= 100000000; i = i * 10)
{
    fillArray(i);
    watch.Restart();
    int max = Int32.MinValue; // Reset
    for (int j = 0; j < arrDoubles.Length; j++)
    {
        max = Math.Max(arrDoubles[j], max);
    }
    watch.Stop();

    TimeSpan loopSpan = watch.Elapsed;
    
    watch.Restart();
    max = Int32.MinValue; // Reset
    max = arrDoubles.Max();
    watch.Stop();

    TimeSpan linqSpan = watch.Elapsed;
}

//-------------------------------------------

private void fillArray(int nbValues)
{
    int Min = Int32.MinValue;
    int Max = Int32.MaxValue;
    Random randNum = new Random();
    arrDoubles = Enumerable.Repeat(0, nbValues).Select(i => randNum.Next(Min, Max)).ToArray();
}
Community
  • 1
  • 1
Thomas Ayoub
  • 29,063
  • 15
  • 95
  • 142
  • 4
    First thing to do is to use a loop to call the code under test more times; those times are far too short to tell you much. However, accessing elements via `IEnumerable<>` is a lot slower than accessing elements of an array. Is that what you are comparing it with? – Matthew Watson Jul 21 '15 at 13:27
  • 6
    You haven't shown us *your* loop. I suggest you copy the reference implementation into your project, see if you still get the same results, and then remove the differences between your implementation and the LINQ implementation until the performance is the same. – Heinzi Jul 21 '15 at 13:29
  • 5
    Possible not the execution of `Max()` ist the bottleneck, but the execution of a LINQ query on which you call `Max()`? – LInsoDeTeh Jul 21 '15 at 13:29
  • possible duplicate of [In .NET, which loop runs faster, 'for' or 'foreach'?](http://stackoverflow.com/questions/365615/in-net-which-loop-runs-faster-for-or-foreach) – Ian Mercer Jul 21 '15 at 13:52
  • You are comparing `foreach` and `for` which is a common question: http://stackoverflow.com/questions/365615/in-net-which-loop-runs-faster-for-or-foreach – Ian Mercer Jul 21 '15 at 13:54
  • 1
    @IanMercer The results are the same no matter if I use for or foreach as written in the question. This is not a dup about that. – Thomas Ayoub Jul 21 '15 at 13:59

2 Answers2

11

This is likely to be happening because accessing an array via IEnumerable<> is much slower than accessing from it's actual array type (even when using foreach).

The following code demonstrates this. Note how the code in max1() and max2() is identical; the only difference is the type of the array parameter. Both methods are passed the same object during testing.

Try running it from a RELEASE build (and not under the debugger, which will enable debug code even for a release build):

using System;
using System.Collections.Generic;
using System.Diagnostics;

namespace Demo
{
    public class Program
    {
        private static void Main(string[] args)
        {
            var array = new int[100000000];

            var sw = new Stopwatch();

            for (int trial = 0; trial < 8; ++trial)
            {
                sw.Restart();
                for (int i = 0; i < 10; ++i)
                    max1(array);
                var elapsed1 = sw.Elapsed;
                Console.WriteLine("int[] took " + elapsed1);

                sw.Restart();
                for (int i = 0; i < 10; ++i)
                    max2(array);
                var elapsed2 = sw.Elapsed;
                Console.WriteLine("IEnumerable<int> took " + elapsed2);

                Console.WriteLine("\nFirst method was {0} times faster.\n", elapsed2.TotalSeconds / elapsed1.TotalSeconds);
            }
        }

        private static int max1(int[] array)
        {
            int result = int.MinValue;

            foreach (int n in array)
                if (n > result)
                    result = n;

            return result;
        }

        private static int max2(IEnumerable<int> array)
        {
            int result = int.MinValue;

            foreach (int n in array)
                if (n > result)
                    result = n;

            return result;
        }
    }
}

On my PC, the int[] version is around 10 times faster than the IEnumerable<int> version.

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
2

Comparing two integers is fast in any scale (just a few cycles). Enumerating, however, takes longer, so you can say that LINQ spends more time enumerating than comparing - while your code has a direct access to any number in the array.

shay__
  • 3,815
  • 17
  • 34