0

I tried doing the same thing using Linq and non-Linq methods and found out that Linq is significantly slower (~3000x).

Why is that?

Linq way:

for (int i = 0; i < totalElements; i += stepSize)
{
    var currentBlock = testList
        .Skip(i)
        .Take(stepSize);

    result.Add(currentBlock.Sum());
}

result.ToList();

Non-Linq way:

for (int i = 0; i < totalElements; i += stepSize)
{
    var currentBlock = testList.GetRange(i, stepSize);

    result2.Add(currentBlock.Sum());
}

result2.ToList();

Results:

  • Method: Linq, Time taken: 26667 ms, Elements: 1000000, Step Size: 100

  • Method: GetRange, Time taken: 9 ms, Elements: 1000000, Step Size: 100

Full source code as requested:

static void Main(string[] args)
{
    var totalElements = 1000000;
    var testList = new List<int>(totalElements);
    var rand = new Random();

    // Initialize the list to random integers between 1 and 1000
    for (int i = 0; i < totalElements; i++)
    {
        testList.Add(rand.Next(1, 1000));
    }

    var result = new List<int>();
    var stepSize = 100;
    var stp = new Stopwatch();

    stp.Start();
    for (int i = 0; i < totalElements; i += stepSize)
    {
        var currentBlock = testList
            .Skip(i)
            .Take(stepSize);

        result.Add(currentBlock.Sum());
    }

    result.ToList();
    stp.Stop();

    Console.WriteLine($"Method: Linq, Time taken: {stp.ElapsedMilliseconds} ms, Elements: {totalElements}, Step Size: {stepSize}");

    stp.Reset();

    var result2 = new List<int>();
    stp.Start();

    for (int i = 0; i < totalElements; i += stepSize)
    {
        var currentBlock = testList.GetRange(i, stepSize);

        result2.Add(currentBlock.Sum());
    }

    result2.ToList();
    stp.Stop();

    Console.WriteLine($"Method: GetRange, Time taken: {stp.ElapsedMilliseconds} ms, Elements: {totalElements}, Step Size: {stepSize}");
}
Dipen
  • 81
  • 1
  • 8
  • 2
    It's very hard to tell or investigate further with only tiny snippets. Please provide a [mcve]. – Jon Skeet Feb 08 '18 at 16:58
  • 2
    I don't think this is a duplicate of general "linq performance" questions. 26 seconds vs 9ms is extreme. Something weird is going on, probably in the code we can't see. – Blorgbeard Feb 08 '18 at 17:00
  • @JonSkeet I've added the full source code that I used for testing – Dipen Feb 08 '18 at 17:06
  • 2
    `Skip` and `GetRange` are not equivalent at all, they work very differently. It's true that `Skip` maybe should have been optimized in cases where `source` implements `IList`, similar to how other `IEnumerable` extension methods are implemented. Not sure why they didn't do it... – InBetween Feb 08 '18 at 17:09
  • 1
    I get a more modest difference: ~800ms vs ~7ms https://dotnetfiddle.net/QhjkVA – Blorgbeard Feb 08 '18 at 17:14
  • @Blorgbeard the total element in the question is 1M, not 100K. – Sphinx Feb 08 '18 at 17:15
  • @Sphinx looks like `100000` to me. I copy and pasted. – Blorgbeard Feb 08 '18 at 17:19
  • @Sphinx oh I see in the output, but not in the code. That probably explains it. – Blorgbeard Feb 08 '18 at 17:20
  • @Blorgbeard Sorry for the confusion, I initially tested with 1M, but pasted the 100000 code in the edit. – Dipen Feb 08 '18 at 18:54

2 Answers2

5

The problem is how Skip works, which is radically different from GetRange. Skip always starts at the beginning of the enumeration which means you are doing the following:

Iteration #1: Skip 0
Iteration #2: Skip 1 * step
Iteration #3: Skip 2 * step
Iteration #4: Skip 3 * step
Iteration #5: Skip 4 * step
....
Iteration #1.000: Skip 9.999 * step

If you do the maths for 1.000.000 elements and a step of 100 you get:

sum = 1 + 2 + 3 + .... + 9.999 = 9.999 * (9.999 + 1) / 2 = 49.995.000
total elements skipped: 49.995.000 * 100 = 4.999.500.000

So, your Linq version has a whopping 4.999.500.000 unnecessary iterations.

A good question here would be: why hasn't Skip been optimized for cases where source implements IList<T>, because, plainly, it would be possible.

InBetween
  • 32,319
  • 3
  • 50
  • 90
  • 3
    See also https://codeblog.jonskeet.uk/2011/01/02/reimplementing-linq-to-objects-part-23-take-skip-takewhile-skipwhile/ which talks about this. – Jon Skeet Feb 08 '18 at 17:35
0

GetRange uses Skip(). It always starts enumerating at the beginning. What you'd like to have is a function that divides you sequence into chunks without iterating over the sequence more than really needed.

This means that if you only want the first Chunk the function should not iterate over more than this Chunk, and if I want the 10th chunk after the 9th Chunk, it should not start iterating at the beginning.

How about this extension function?

public static IEnumerable<IEnumerable<Tsource>> ToChuncks<TSource>(
    this IEnumerable<TSource> source, int chunkSize)
{
    while (source.Any())                 // while there are elements left
    {   // still something to chunk
        // yield return a chunk
        yield return source.Take(chunkSize); // return a chunk of chunkSize

        // remove the chunk from the source
        source = source.Skip(chunkSize);     // skip the returned chunk
    }
}

This function repeatedly checks if there is still something left in source sequence. If so, it yield returns a Chunk of data and removed the chunk from your source.

This way your complete source is iterated at utmost twice: once if you iterate over the the elements in a Chunk, and once if you iterate over the Chunks.

Harald Coppoolse
  • 28,834
  • 7
  • 67
  • 116