2

I recently wondered why the Enumerable-extension methods like Skip, SkipWhile or TakeWhile are not optimized in order that they use a for-loop instead of enumerating all elements.

Other methods like ElementAt first check if the sequence can be casted to IList<T>, then they use the indexer to find the element faster. Similar with First, Last, Count etc., they first check if IEnumerable<T> can be casted to a collection which supports faster lookups.

Why are the Skip methods not omptimized in this way?

Here is what i've got from ILSpy(.NET 4), the SkipIterator is used for for all skip methods, but similar code is used for TakeWhile:

// System.Linq.Enumerable
private static IEnumerable<TSource> SkipIterator<TSource>(IEnumerable<TSource> source, int count)
{
    using (IEnumerator<TSource> enumerator = source.GetEnumerator())
    {
        while (count > 0 && enumerator.MoveNext())
        {
            count--;
        }
        if (count <= 0)
        {
            while (enumerator.MoveNext())
            {
                yield return enumerator.Current;
            }
        }
    }
    yield break;
}

As you can see it doesn't even try to use a for-loop if possible.

Why is the code not similar to this?

public static IEnumerable<TSource> MySkip<TSource>(this IEnumerable<TSource> source, int count)
{
    // check if it's a list or array to use a for-loop
    IList<TSource> listT = source as IList<TSource>;
    if (listT != null)
    {
        if (listT.Count <= count) yield break;
        for(int i = count; i < listT.Count; i++)
            yield return listT[i];
    }

    // no list or array, fallback to .NET method
    foreach (TSource t in source.Skip(count))
        yield return t;
}

I have also tested this extension method against the default Skip, here is the senseless query that checks how many elements in a list have a follower(always count-1), but it should do the job:

Random rnd = new Random();
var randomNumbers = Enumerable.Range(1, 100000)
    .Select(i => new { Number = rnd.Next(100) })
    .ToList();

Stopwatch sw1 = new Stopwatch();
sw1.Start();

int countAllButLast = randomNumbers
    .TakeWhile((s, i) => randomNumbers.Skip(i).Any())
    .Count();

sw1.Stop();
Console.WriteLine("Time for default Skip with foreach: {0} Result: {1}", sw1.Elapsed, countAllButLast);

Stopwatch sw2 = new Stopwatch();
sw2.Start();

countAllButLast = randomNumbers
    .TakeWhile((s, i) => randomNumbers.MySkip(i).Any())
    .Count();

sw2.Stop();
Console.WriteLine("Time for MySkip with for-loop: {0} Result: {1}", sw2.Elapsed, countAllButLast);

The performance difference is tremendous:

Time for default Skip with foreach:     00:01:16.5630092 Result: 99999
Time for MySkip with for-loop:          00:00:00.0334746 Result: 99999
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939

0 Answers0