Why is this function faster and why are multiple enumerations of it faster than the first?

Question

I needed a TakeLast<T>(int n) -style LINQ function. I came across this StackOverflow post: https://stackoverflow.com/a/3453282/825011. I liked this answer simply because it was a simple implementation. Then, another colleague of mine pointed out that the Reverse() must be more-costly than Skip(length - n). Which caused me to write a test.

Here are the competing functions.

public static IEnumerable<T> TakeLast<T>( this IEnumerable<T> c, int n ) {
    return c.Reverse().Take( n ).Reverse();
}


public static IEnumerable<T> TakeLast2<T>( this IEnumerable<T> c, int n ) {
    var count = c.Count();
    return c.Skip( count - n );
}

I timed the execution of obtaining the last 10 elements of the enumeration Enumerable.Range( 0, 100000 ). I found that:

TakeLast() is faster ~5x.
Enumerations of TakeLast() are significantly faster after the first enumeration.

Here's the .NET Fiddle for my code (which was originally ran locally, but is also demonstrated here.): http://dotnetfiddle.net/ru7PZE

Questions

Why is TakeLast() faster?
Why are the second and third enumerations of TakeLast() faster than the first, but all enumerations of TakeLast2() are about the same?

Because it has been enumerated and seems that when calling Reverse it's getting optimized. — csharpwinphonexaml, Apr 07 '14 at 14:46
This is probably closely related to the implementation of `Enumerable.Range`. If you actually care about the performance of, say, `List`, you should profile with that instead. If you actually care about the performance of `Range`, you could just create a new `Range` with the proper start/count values. — Tim S., Apr 07 '14 at 14:50
By looking at the results I assume the compiler is storing the enumerable as a linked list. So it knows the start of the list and the end, which makes reversing it is trivial and grabbing the first X elements pretty trivial. As opposed to iterating over the list to skip the first n elements. But this is just a guess as I do not know how the compiler is storing the list. — gh9, Apr 07 '14 at 14:50
using the lists seems pretty similar http://dotnetfiddle.net/CfSo8b — csharpwinphonexaml, Apr 07 '14 at 14:51
To fix your benchmark, materialize the results *before* you print the time, e.g. with `enumerable.TakeLast(10).ToList()`. This is (how to fix) the flaw that Dominic and Servy are referring to. — Tim S., Apr 07 '14 at 14:52
This is a bit of a guess, but wouldn't `TakeLast` actually enumerate 100000 for first reverse, then take N elements, and then reverse just N elements, whereas `TakeLast2` will enumerate 100000 times for `.Count()` and then almost 10000 for `.Skip()`. Thus the second one goes thru large set twice where as former goes through large set once? — LB2, Apr 07 '14 at 14:54
@LB2 Yes, but the benchmark is flawed. The second does the first iteration *eagerly* and the second *lazily*. The first method does the entire operation lazily. Since he records the time before printing the results everything that is deferred is not counted in the benchmark. — Servy, Apr 07 '14 at 14:56
@verdesrobert Your benchmark is *also* not actually executing the query, just constructing the query. The fact that the input implements `IList` simply means that `Count` doesn't need to iterate the sequence, it can get the count for free, which removes the difference in times *generating* the query. It also allows `Skip` to skip right to the correct spot. The source sequence implementing `IList` is the best case for the second method; the first methods benefits, but not *as much*. — Servy, Apr 07 '14 at 15:01

score 10 · Accepted Answer · answered Apr 07 '14 at 14:50

You're not materializing the results of your query until after you print out the stopwatch's elapsed time. LINQ queries use deferred execution to avoid actually executing the query until they are enumerated. In the case of your second method you're calling Count before building the query. Count needs to actually enumerate the entire result set to compute it's value. This means that your second method needs to iterate the sequence each time, whereas the first query is able to successfully defer its work until after you display the time. I would expect it to have more work to do, in many situations, it just succeeds at waiting until after you're done timing it.

As for why the first is faster when called multiple times, that pretty much just comes down to the fact that the JIT warmup that takes place when executing any code. The second method is does get that speedup, but since it doesn't get to omit the iteration of the query each time (which is a large portion of its cost) the percent speedup is much smaller.

Note that your second query iterates the source sequence twice (unless the enumerable happens to implement ICollection). This means that if the object is something that can be efficient iterated multiple times, it's not a problem. If it implements IList it will in fact be much faster, but if it is something like say an IQueryable that needs to execute an expensive query against a database each time its iterated it will need to do that twice, not once. If it's a query that doesn't even have the same contents when iterated multiple times then that can cause all sorts of problems.

Why is this function faster and why are multiple enumerations of it faster than the first?

Questions

1 Answers1