-3

I have been wondering for a while and it keeps bugging me in what way it is more efficient to write a foreach statement that uses LINQ.

As far as I know a ToList() creates an object in memory while an IEnumerable makes a refference and only when the data is needed it filters the data for use.

The question is, does the foreach statement call the List / IEnumerable on each iteration, or does it do it once and keep that object/List in memory?

Looking what the following, which option would be the most efficient and for what reason?

  1. Option A

    foreach (Car car in CarList.Where(x => x.Make == "BMW")) {}
    
  2. Option B

    foreach (Car car in CarList.Where(x => x.Make == "BMW").ToList()) {}
    
  3. Option C

    IEnumerable<Car> myCarList = CarList.Where(x => x.Make == "BMW");
    foreach (Car car in myCarList) {}
    
  4. Option D

    IEnumerable<Car> myCarList = CarList.Where(x => x.Make == "BMW").ToList();
    foreach (Car car in myCarList) {}
    
Elmer
  • 384
  • 5
  • 19
  • 1
    My guess would be A and C are the fastest, but it really depends on what `CarList` is. As with anything performance related, the only way to tell is to test it yourself. – DavidG Aug 19 '21 at 08:42
  • 1
    You never have to ask this type of question again https://github.com/dotnet/BenchmarkDotNet – TheGeneral Aug 19 '21 at 08:43
  • 4
    @faso No, never use Stopwatch for benchmarking, it is massively unreliable. Use a proper tool like BenchmarkDotNet – DavidG Aug 19 '21 at 08:44
  • 1
    What @DavidG said – TheGeneral Aug 19 '21 at 08:45
  • `As far as I know a ToList() creates an object in memory while an IEnumerable makes a refference and only when the data is needed it filters the data for use.` Generally `IEnumerable` doesn't make a reference or an object in memory (of the type you are interested in) _unless you iterate it_. – mjwills Aug 19 '21 at 08:45
  • 1
    a and c are basically the same, as is b and d. Generally speaking a and c will be faster since they don't incur the cost of creating and resizing a list (how much faster depends largely on the size of the list). Option e (a check inside a standard foreach without using LINQ at all) will almost always be faster than all of them. – mjwills Aug 19 '21 at 08:46
  • `The question is, does the foreach statement call the List / IEnumerable on each iteration, or does it do it once and keep that object/List in memory?` I don't understand what that question means. If you want to keep the results of the filtering in memory, use `ToList`. If you don't, don't use `ToList` (and thus a second iteration will filter a second time). – mjwills Aug 19 '21 at 08:49
  • I suspect https://stackoverflow.com/questions/3628425/ienumerable-vs-list-what-to-use-how-do-they-work may be worth a read. And if you _really_ care about this, check out https://github.com/NetFabric/NetFabric.Hyperlinq . – mjwills Aug 19 '21 at 08:49
  • `A == C && B == D && Speed(A) > Speed(B) && Mem(A) < Mem(B)` ... No need to benchmark. –  Aug 19 '21 at 08:58
  • Does this answer your question? [IEnumerable Where() and ToList() - What do they really do?](https://stackoverflow.com/questions/23090459/) • [Deferred Execution of LINQ](https://www.tutorialsteacher.com/linq/linq-deferred-execution) • [Deferred vs Immediate execution in LINQ](https://www.c-sharpcorner.com/UploadFile/rahul4_saxena/deferred-vs-immediate-query-execution-in-linq/) • [Deferred execution and lazy evaluation](https://docs.microsoft.com/dotnet/standard/linq/deferred-execution-lazy-evaluation) • [LINQ (MSDocs)](https://docs.microsoft.com/dotnet/csharp/programming-guide/concepts/linq/) –  Aug 19 '21 at 09:01
  • @OlivierRogier You can't say that for sure since we have no idea what `CarList` is. In 99.9% of cases you are probably correct, but that object could be anything from a `List` to a network stream of data – DavidG Aug 19 '21 at 09:20
  • @DavidG Whatever it is though, creating a list _must_ be slower right? I am struggling to think of a 0.1% scenario where `ToList` could be faster... – mjwills Aug 19 '21 at 12:32
  • Also note that `ToList` in some edge cases can not only slow things down but _cause exceptions_. This is generally related to concurrent collections on .NET Framework. https://jeremyrsellars.github.io/no-new-legacy/posts/2016-10-21-concurrentdictionary-and-the-pit-of-success/ – mjwills Aug 19 '21 at 13:25
  • @mjwills I wasn't talking about that, more that we have no idea what each iteration of the `IEnumerable` loop could be doing. A contrived example would be if the stream was coming over a socket. It may be faster to grab all the data in a firehose type way than grab an item, do some processing, grab another item, and so on. – DavidG Aug 19 '21 at 13:58
  • @DavidG Fair point. – mjwills Aug 19 '21 at 22:29

1 Answers1

0

Looking at this (admittedly old) answer: Does "foreach" cause repeated Linq execution?

It depends on the dataset to an extent; but because of how LINQ and IEnumerables work, A & C are both the same in terms of functionality. Instead of executing the query in a single-hit; the results are retrieved in a streaming manner, meaning one by one. Each time the iterator calls MoveNext the projection is applied to the next object; because there's a where clause in your example it applies the filter before the projection.

By calling the .ToList() method in examples B & D, you're forcing the query to execute and the result to cached. In terms of the "which one is better" question, that's where the answer becomes "it depends".

If the dataset are already in-memory objects; A & C both save a bit on memory, and are slightly quicker than B & D because it's not having to do any manipulation in terms of resizing the list.

If you're querying a database, then A & C save on memory; however (you'd have to test this bit, because it seems hit and miss) it's possible that it'd go back the DB each time the MoveNext is hit - on a small table it wouldn't make much difference, but I have encountered instances in large tables where it's saved several minutes worth of execution time just by creating a local list of the query results.

EDIT for clarity:

Adding in some pseudocode to elaborate on this point. The premise behind how A & C work is as follows:

  1. Look for an element that meets the criteria.
  2. Get the first element that meets the selection criteria.
  3. Do whatever is within the loop.
  4. Look for another element.
  5. Get the next element.
  6. Do whatever is within the loop.
  7. Repeat steps 4-6 until a result is not found.

Whereas B & D work more along the lines of the following:

  1. Find all elements that match the selection criteria.
  2. Create a list from the results to step 1.
  3. Assign a pointer that points at the first element in the list.
  4. Do the code within the loop.
  5. Move the pointer to the next item in the list.
  6. Do the code within the loop.
  7. Repeat steps 5 and 6 for all items in the list.

A more real-life scenario that can roughly explain it is when you go shopping - if you have the shopping list in your hand, because you've already spent the time to figure out what you need, (B & D) then you just need to look at the list and grab the next item. If you don't have the shopping list (A & C), then you have the extra step in the store of thinking "what do I need?" before retrieving the item.

Andrew Corrigan
  • 1,017
  • 6
  • 23
  • `it's possible that it'd go back the DB each time the MoveNext is hit` But wouldn't B & D be impacted by this same cost? Why does A & C incur that cost but D & D don't? _Both_ of them are calling `MoveNext`. – mjwills Aug 19 '21 at 12:34
  • I'll edit this in in a moment for clarity; but in essence the ToList makes it go to the database and grab all the results and stores them locally, before doing the loop. I'll edit in some pseudocode to explain it a bit further. – Andrew Corrigan Aug 19 '21 at 13:02
  • 1
    I suspect you will find `foreach`ing over it will act similarly in most cases since the DB layer will do batch requests (not a row at a time). https://learn.microsoft.com/en-us/ef/core/performance/efficient-querying#internal-buffering-by-ef – mjwills Aug 19 '21 at 13:21
  • @mjwills I hadn't seen that one; that must be why I've seen it work well in some environments but not on others. It hadn't occurred to me beforehand, but now you've shown me that, I'm remembering that the places where I'd seen the query repetitively hit the DB in execution of a for loop were places that didn't use entity framework for the connection. – Andrew Corrigan Aug 19 '21 at 13:37
  • 1
    You're making an assumption that this is an EF (or similar) query. We just don't know. – DavidG Aug 19 '21 at 13:54
  • @DavidG, aye, that's why I tried to keep my answer somewhat open - in some scenarios, it does hit the database on each 'MoveNext' while in other scenarios it doesn't. It really depends on where it's getting the list of results from and how it establishes the connection if it is an external datasource. By giving information in how the 'ToList' makes the difference of forcing the LINQ to be not-lazy-loaded for want of a better phrase, it at least highlights where the noticeable differences would be. – Andrew Corrigan Aug 19 '21 at 14:17