2

Suppose we have a class Foo which has an Int32 Field Bar and you want to sort the collection of the Foo objects by the Bar Value. One way is to implement IComparable's CompareTo() method, but it can also be done using Language Integrated Query (LINQ) like this

List<Foo> foos = new List<Foo>();
// assign some values here
var sortedFoos = foos.OrderBy(f => f.Bar);

now in sortedFoos we have foos collection which is sorted. But if you use System.Diagnostics.StopWatch object to measure the time it took OrderBy() to sort the collection is always 0 milliseconds. But whenever you print sortedFoos collection it's obviously sorted. How is that possible. It takes literary no time to sort collection but after method execution the collection is sorted ? Can somebody explain to me how that works ? And also one thing. Suppose after sorting foos collection I added another element to it. now whenever I print out the collection the the element I added should be in the end right ? WRONG ! The foos collection will be sorted like, the element I added was the part of foos collection even if I add that element to foos after sorting. I don't quit understand how any of that work so can anybody make it clear for me ?!

Dimitri
  • 2,798
  • 7
  • 39
  • 59
  • 2
    Some of the LINQ methods are lazily evaluated - you can read up on [deferred execution here, examples with LINQ 2 XML](http://msdn.microsoft.com/en-us/library/bb943859.aspx). The collection will be really ordered when it's materialized (by e.g. calling `ToList()`). – Patryk Ćwiek Jun 19 '13 at 09:01
  • Because LINQ is deferred execution, you should call `ToList()` to trigger execution immediately – cuongle Jun 19 '13 at 09:02
  • My own question a while ago is related and might help to understand linq's deferred execution because Eric/Jon has explained it very well: http://stackoverflow.com/questions/10110013/order-of-linq-extension-methods-does-not-affect-performance – Tim Schmelter Jun 19 '13 at 09:03

4 Answers4

5

Almost all LINQ methods use lazy evaluation - they don't immediately do anything, but they set up the query to do the right thing when you ask for the data.

OrderBy follows this model too - although it's less lazy than methods like Where and Select. When you ask for the first result from the result of OrderBy, it will read all the data from the source, sort all of it, and then return the first element. Compare this to Select for example, where asking for the first element from the projection only asks for the first element from the source.

If you're interested in how LINQ to Objects works behind the scenes, you might want to read my Edulinq blog series - a complete reimplementation of LINQ to Objects, with a blog post describing each method's behaviour and implementation.

(In my own implementation of OrderBy, I actually only sort lazily - I use a quick sort and sort "just enough" to return the next element. This can make things like largeCollection.OrderBy(...).First() a lot quicker.)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • My whole answer is inspired by reading your excellent linked blog. Kudos for rewriting the `IOrderedEnumerable` implementation but faster, without any "black magic." – Jodrell Jun 19 '13 at 09:33
  • @Jodrell: From what I remember around the time I was implementing it, it felt like very black magic indeed ;) – Jon Skeet Jun 19 '13 at 09:39
3

LINQ believe in deferred execution. This means the expression will only be evaluated when you started iterating or accessing the result.

S.N
  • 4,910
  • 5
  • 31
  • 51
  • _"With deferred execution, the above iterates your collection one time"_ This seems to be incorrect, `foos.OrderBy(f => f.Bar)` does not iterate the sequence (or original collection) at all. I assume it's just expressed inaccurately. – Tim Schmelter Jun 19 '13 at 09:08
1

The OrderBy extension uses the deault IComparer for the type it is working on, unless an alternative is passed via the appropriate overload.

The sorting work is defered until the IOrderedEnumerable<T> returned by your statement is first accessed. If you place the Stopwatch around that first access, you'll see how long sorting takes.

This makes a lot of sense, since your statement could be formed from multiple calls that return IOrderedEnumerables. As the ordering calls are chained, they fluently extend the result, allowing the finally returned IOrderedEnumerable to perform the sorting in the most expedient way possible. This is probably achieved by chaining all the IComparer calls and sorting once. A greedy implementation would have to wastefully sort multiple times.

For instance, consider

class MadeUp
{
    public int A;
    public DateTime B;
    public string C;
    public Guid D;
}

var verySorted = madeUps.OrderBy(m => m.A)
                    .ThenBy(m => m.B)
                    .ThenByDescending(m => m.C)
                    .ThenBy(m => m.D);

If verySorted was evaluated greedily, then every property in the sequence would be evaluated and the sequence would be reordered 4 times. Because the linq implementation of IOrderedEnumerable deferes sorting until enumeration, it is able to optimise the process.

The IComparers for A, B, C and D can be combined into a composite delegate, something like this simplified representation,

int result
result = comparerA(A, B);
if (result == 0)
{
    result = comparerB(A, B);
    if (result == 0)
    {
        result = comparerC(A, B);
        if (result == 0)
        {
            result = comparerD(A, B);
        }
    }
}

return result;

the composite delegate is then used to sort the sequence once.

Jodrell
  • 34,946
  • 5
  • 87
  • 124
0

You have to add the ToList() to get a new collection. If you do't the OrderBy will be called when you start iterating on sortedFoos.

List<Foo> foos = new List<Foo>();
// assign some values here
var sortedFoos = foos.OrderBy(f => f.Bar).ToList();
Peter
  • 27,590
  • 8
  • 64
  • 84