5

Here's an enumeration of random integers:

var r = new Random();
var e = Enumerable.Repeat(1, 10_000_000).Select( _ => r.Next());

Which version do you think is faster:

var result = e.OrderBy(x => x).Last(); //materialize the IEnumerable

or

var l = e.ToList();
l.Sort();
var result = l.Last();

I was hoping that .OrderBy(x => x).Last() in the first example would be optimized to only sort a small subset of the list or to just do an O(n) walk of the list.

Spoiler: It's not.

But then, the performance of the two versions should be at least comparable, right? I mean:

In the first one, OrderBy() allocates a temp array and sorts it in place.
In the second one, I explicitly allocate a list and Sort() it in place.

The actual results show that the OrderBy() example is 4-5x slower! (5-6 sec vs 1.2-1.3 sec)

Can anyone come up with an explanation why?

The .OrderBy(x => x) case does execute its x => x identity lambda for each element.
The difference between:

var result2 = e.Last();

and

var result2 = e.Select(x=>x).Last();

is measurable, but small: 30 - 50 ms more in the second case. So that doesn't explain the huge gap.

Cristian Diaconescu
  • 34,633
  • 32
  • 143
  • 233
  • 3
    This is why I wish there were versions of `Min()` and `Max()` that returned the object instead of the value. It would avoid the need to sort the entire item set to get the object with the highest or lowest value. – itsme86 Aug 10 '18 at 14:32
  • @itsme86 What do you mean? You can always do `IEnumerable thingsList = ...; MyThing pubBum = thingsList.Max( thing => thing.BeersBeforeCollapsing ) ` – Cristian Diaconescu Aug 10 '18 at 14:36
  • 2
    That returns an `int` assuming `BeersBeforeCollapsing` is an `int`. It could also be a float I guess. – itsme86 Aug 10 '18 at 14:37
  • No, it returns a `MyThing` – Cristian Diaconescu Aug 10 '18 at 14:38
  • Not on my computer... `MyObj[] foo = { new MyObj { Blee = 7 } }; var bar = foo.Max(o => o.Blee);` `bar` is 7, not the `MyObj` object. If you were able to do that, you wouldn't have to do your `.OrderBy().Last()`, you could just `.Max()`. – itsme86 Aug 10 '18 at 14:42
  • 1
    Oh wow, you're right! Sorry, I would have bet on this (and lost money!) – Cristian Diaconescu Aug 10 '18 at 14:44
  • The reason I used `.Last()` was just a way to force the execution of the Linq chain. I initially wanted to add a `.ToList()` at the end, but that would have been another allocation. – Cristian Diaconescu Aug 10 '18 at 14:47
  • Makes sense. Didn't mean to derail your question. – itsme86 Aug 10 '18 at 14:48
  • 2
    LINQ to Object doesn't do any clever fusion optimizations. `OrderBy().Last()` is `OrderBy()` followed by `Last()`, and not something that's faster than the individual operations. You're basically asking why the in-place sort operation `List.Sort` (which uses [introsort](https://en.wikipedia.org/wiki/Introsort)) is faster than `Enumerable.OrderBy` (which uses quicksort, hobbled by the requirement of the comparison going through a key selector lambda). If you wanted to get to the bottom of that, [benchmark.net](https://benchmarkdotnet.org/) could probably tell you. – Jeroen Mostert Aug 10 '18 at 14:54
  • Related: [C# Sort and OrderBy comparison](https://stackoverflow.com/q/1832684/4934172). However, the result seems to be different when dealing with objects? – 41686d6564 stands w. Palestine Aug 10 '18 at 15:09
  • @itsme86 That's why I have `Aggregate` based `MaxBy` and `MinBy` extensions written. – NetMage Aug 10 '18 at 18:21
  • I would use `First` instead of `Last`, which avoids the unnecessary traversal of the result since you are just trying to force execution of `OrderBy` in this case. – NetMage Aug 10 '18 at 18:24
  • It is fast to convert a list to an array. Then you can just use the Max() method. See this example: https://www.dotnetperls.com/max. – Dan Randolph Aug 10 '18 at 19:36

1 Answers1

4

It appears that List has a special optimized C++ version of sorting it uses when it is comparing types with the Comparer.Default or no IComparer for the type. OrderBy always does a generic sort suitable for any type and IComparer.

If you replace the Select result with objects of type MyInt as follows:

public class MyInt : IComparable {
    public int value;
    public MyInt(int newv) => value = newv;

    public int CompareTo(object obj) {
        if (obj is MyInt m2) {
            return value.CompareTo(m2.value);
        }
        else if (obj is int i2) {
            return value.CompareTo(i2);
        }
        else {
            throw new Exception($"can't compare MyInt to {obj.GetType()}");
        }
    }
}

var e = Enumerable.Repeat(1, 10_000_000).Select(_ => new MyInt(r.Next()));

Then OrderBy will be 2 times faster than the List.Sort method.

Note, if you use Comparer<>.Create to create a Comparer for MyInt, List.Sort is about on par with OrderBy:

l.Sort(Comparer<MyInt>.Create((m1,m2) => m1.value.CompareTo(m2.value)));
NetMage
  • 26,163
  • 3
  • 34
  • 55