3

I have a case where I want to iterate all but the last 2 elements of a collection.

Let's say I do it a weird way like x.Reverse().Skip(2).Reverse().

Will each LINQ operation generate effectively a nested iterator or cause enumeration, etc or is it smarter than that? What happens under the hood in this sort of case?


Clarification: this is just one example of chained LINQ statements you might see, where a developer favours short powerful code without thinking too hard about performance - maybe they are a Computer Science student and this seems the 'cleverest' solution. I am not asking how to solve this particular example

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
Mr. Boy
  • 60,845
  • 93
  • 320
  • 589
  • Does [this](https://stackoverflow.com/questions/10110013/order-of-linq-extension-methods-does-not-affect-performance?noredirect=1&lq=1) answer your question? – Sweeper May 01 '20 at 18:08
  • 2
    `Reverse` has to iterate the entire collection. Depending on the underlying type `Count` might not have to iterate the entire collection. – juharr May 01 '20 at 18:08
  • What is the type of underlying source collection `x`? Which .NET version are you targeting so far? – Pavel Anikhouski May 01 '20 at 18:08
  • 1
    Why do you want to avoid `Count()`? – Sweeper May 01 '20 at 18:09
  • 1
    Any chance you could just use [`SkipLast()`](https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.skiplast) and just let the implementation handle the query optimization? – Patrick Roberts May 01 '20 at 18:20
  • Unless the collection is very large, I would write the code in the way that is easiest to understand. But if you are worried about microoptimizing, or just curious how it all works, you can always look at the [reference source code](https://github.com/microsoft/referencesource/blob/master/System.Core/System/Linq/Enumerable.cs). – John Wu May 01 '20 at 18:20
  • 1
    You may use [`SkipLast`](https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.skiplast?view=netcore-3.1) method, which was added starting from .NET Core 2.x – Pavel Anikhouski May 01 '20 at 18:27
  • Or you can use the `SkipLast` in MoreLinq if you're still on .Net Framework – juharr May 01 '20 at 18:30
  • @juharr I believe `List.Count` is O(1), not sure about the others. – RoadRunner May 01 '20 at 19:04
  • 1
    @RoadRunner `Count()` takes advantage of anything that implements `ICollection` or `ICollection` to use the `Count` property which would almost always be O(1). – juharr May 01 '20 at 19:11
  • @juharr Yeah that would make sense. Thanks for clarifying :) – RoadRunner May 01 '20 at 19:12
  • Actually `Reverse` also takes advantage of anything that is `ICollection` by using it's `CopyTo` go get the values into a buffer array to iterate backwards over, so the double `Reverse` on a `ICollection` might not be that bad, but definitely has more overhead (especially in memory usage). – juharr May 01 '20 at 19:14
  • @juharr I had figured it would do deferred execution and just iterate backwards where possible. Not the case? This is the kind fo thing the question is here to ask. You see people chaining LINQ for short powerful functionality but it's not very clear how performant it is – Mr. Boy May 01 '20 at 19:53
  • `SkipLast` is a nice addition to Core though not available to me - but the question is not meant to be that specific. – Mr. Boy May 01 '20 at 19:55
  • 2
    It's important to understand what each Linq method does. Most like `Select` and `Where` can yeild results as they iterate their source. But you have to be aware of things like `Reverse` or `OrderBy`, or even `GroupBy` that have to iterate the entire source before they can even begin to yield results. – juharr May 01 '20 at 22:10

3 Answers3

2

First off yes that is creating a "iterator" and does not actually do any iterating until you materialized the query in a foreach or by calling ToList on it. When you do that the number of iterations that occur are based on the underlying type. Reverse will create a buffer array for whatever source you give it and iterate over it backwards. If the source is ICollection<T> then it will use its CopyTo method to populate the array which should usually result in a single bulk copy of contiguous data in constant time. If it's not a ICollection<T> then it will iterate the source into the buffer and then iterate over it backwards. With that in mind here's what happens for your specific query when you iterate.

First the last Reverse will start iterating its source (which is not an ICollection<T>).

Then the Skip will start iterating its source

Then the first Reverse will either do a CopyTo if its source is a ICollection<T> or it will iterate the source into a buffer array that it resizes as needed.

Then the first Reverse will iterate over its buffer array backwards

Then the Skip will take the results skipping the first two and yielding the rest

Then the last Reverse will take the result and add them to its buffer array and resize it as needed.

Finally the last Reverse will iterate the buffer array backwards.

So if you're dealing with a ICollecion<T> that's one CopyTo and then 1 iteration of all of the values and then 1 iteration of all but 2 of the values. If it's not a ICollection<T> that's basically 3 iterations of the values (really the last iteration is of all but 2). And either way it's also using two intermediate arrays in the process.

And to prove that the query does no iterations until you materialize it you can check out this example

void Main()
{
    var query = MyValues().Reverse().Skip(2).Reverse();
    Console.WriteLine($"After query before materialization");
    var results = query.ToList();
    Console.WriteLine(string.Join(",", results));
}

public IEnumerable<int> MyValues()
{
    for(int i = 0; i < 10; i ++)
    {
        Console.WriteLine($"yielding {i}");
        yield return i;
    }
}

Which produces the output

After query before materialization
yielding 0
yielding 1
yielding 2
yielding 3
yielding 4
yielding 5
yielding 6
yielding 7
yielding 8
yielding 9
0,1,2,3,4,5,6,7

When compared to the other example you had x.Take(x.Count() - 2), that will iterate the source before you materialize it once for the Count (unless it's ICollection or ICollection<T> in which case it will just use the Count property) then it will iterate it again when you materialize it.

Here's the same example with the different code and the resulting output.

void Main()
{
    var x = MyValues();
    var query = x.Take(x.Count() - 2);
    Console.WriteLine($"After query before materialization");
    var results = query.ToList();
    Console.WriteLine(string.Join(",", results));
}

public IEnumerable<int> MyValues()
{
    for(int i = 0; i < 10; i ++)
    {
        Console.WriteLine($"yielding {i}");
        yield return i;
    }
}

With the output

yielding 0
yielding 1
yielding 2
yielding 3
yielding 4
yielding 5
yielding 6
yielding 7
yielding 8
yielding 9
After query before materialization
yielding 0
yielding 1
yielding 2
yielding 3
yielding 4
yielding 5
yielding 6
yielding 7
0,1,2,3,4,5,6,7

So which one is better is completely dependent on the source. For a ICollection<T> or ICollection the Take and Count would be preferred (unless the source might change between when the query is created and when it is materialized), but if it's neither of those you might prefer the double Reverse to avoid iterating the source twice (especially if the source can change between when you create the query and actually materialize it as the size might change as well), but that has to be weighted against the increase in total iterations done and memory usage.

juharr
  • 31,741
  • 4
  • 58
  • 93
1

Most LINQ operation do not produce separate nested iteration. Though Count() has to iterate thru complete sequence.

As for the content of your question, please refer to: How to take all but the last element in a sequence using LINQ?

Riad Baghbanli
  • 3,105
  • 1
  • 12
  • 20
  • 1
    `Count` does have a few optimizations, based on detecting the underlying type, where it doesn't actually have to iterate through the entire sequence. Here is the [source](https://referencesource.microsoft.com/#system.core/system/linq/Enumerable.cs,41ef9e39e54d0d0b,references) for reference. – Bradley Uffner May 01 '20 at 19:32
  • The content of my question is not how to take all but the last elements. That is just an example. The question is about how chained LINQ operations actually _work_ – Mr. Boy May 01 '20 at 19:51
  • If you believe the question to be the same as another question, you should be voting to close the question as a duplicate, not answering it... – Heretic Monkey May 01 '20 at 20:00
  • @Mr.Boy If you really want to understand how LINQ works under the hood, you should check the reference source for [`Enumerable`](https://referencesource.microsoft.com/#system.core/system/linq/Enumerable.cs) as that is where the majority of it is implemented. – Bradley Uffner May 01 '20 at 20:09
  • Every question about .Net can be answered "read the source"... not a particularly helpful answer :) I've edited my question to try and make it a bit clearer – Mr. Boy May 01 '20 at 20:11
  • Fully explaining LINQ is beyond the scope of this site. – Bradley Uffner May 01 '20 at 20:13
1

Whenever you want to know how a LINQ statement works, how efficient it is, it might be a good idea to look at the source code (google: reference source Enumerable reverse)

Here you'll find that as soon as you start enumerating your sequence (this is: use a non-deferred method = use a LINQ method that does not return IEnumerable, or use foreach), the first Reverse will enumerate your complete sequence once, but it in a buffer, and start iteratingbackwards from the last element.

Your skip(2) will only enumerate 2 elements.

The 2nd Reverse will make a new Buffer, containing these two elements and start iterating backwards: so in your original sequence forwards.

If you look what happens: the elements of your original sequence are put in a buffer, the last and the pre-last elements are put in a second buffer. This second buffer is iterated: pre-last then last element.

So every element it iterated once, the last two elements are iterated once again. If iterating is a lot of work, consider creating a List, then take the last two elements. This will iterate your elements only once.

If you have other LINQ statements of which you wonder how it is done, lool at the source code

Harald Coppoolse
  • 28,834
  • 7
  • 67
  • 116