2

So it is my understanding that LINQ does not execute everything immediately, it simply stores information to get at the data. So if you do a Where, nothing actually happens to the list, you just get an IEnumerable that has the information it needs to become the list.

One can 'collapse' this information to an actual list by calling ToList.

Now I am wondering, why would the LINQ team implement it like this? It is pretty easy to add a List at each step (or a Dictionary) to cache the results that have already been calculated, so I guess there must be a good reason.

This can be checked by this code:

var list = Enumerable.Range(1, 10).Where(i => {
    Console.WriteLine("Enumerating: " + i);
    return true;
});

var list2 = list.All(i => {
    return true;
});

var list3 = list.Any(i => {
    return false;
});

If the cache were there, it would only output the Enumerating: i once for each number, it would get the items from the cache the second time.

Edit: Additional question, why does LINQ not include a cache option? Like .Cache() to cache the result of the previous enumerable?

vrwim
  • 13,020
  • 13
  • 63
  • 118
  • 1
    What happens if you're building a list against a data source (i.e. database)? Do you really want it to evaluate the result of every enumeration immediately? Most often the answer is no. – Yuck May 25 '16 at 13:00
  • @Yuck no not immediately, but it has "calculated" the values once, why not cache them and use them instead of calculating them instead? – vrwim May 25 '16 at 13:01
  • 1
    Well, isn't it nice that they let you decide what you want? It is as simple as doing Enumerable.Range(1, 10).ToList(); if you need to iterate over the list more than once. – Peter Bons May 25 '16 at 13:01
  • 8
    Indeed - it's much easier to explicitly cache than it is to work around caching occurring when you *don't* want it to. – Jon Skeet May 25 '16 at 13:02
  • @PeterBons yes, that is a good point, but then I enumerate the entire list, what if I am constantly accessing the same 5 elements in a list of 30000? – vrwim May 25 '16 at 13:03
  • Who knows you go over the results multiple times? @vrwim – Patrick Hofman May 25 '16 at 13:03
  • 2
    @vrwim then do a .Where on the elements you need the most and do a .ToList() on that subset. Again, I like it that I am in control when things happen. The way it is setup now you can decide. If the innerworkings would call .ToList() for me I would not be able to influence the 'when' – Peter Bons May 25 '16 at 13:05
  • Also have a read of this thread around Deferred Execution: http://stackoverflow.com/questions/7324033/what-are-the-benefits-of-a-deferred-execution-in-linq I also find the tools in System.Runtime.Caching a good companion: https://msdn.microsoft.com/en-us/library/system.runtime.caching(v=vs.110).aspx – Murray Foxcroft May 25 '16 at 13:06
  • 1
    As an aside, please don't become one of those coders who tacks `.ToList()` on to the end of every single statement involving an `IEnumerable`. – Yuck May 25 '16 at 13:20
  • @Yuck I don't do it now and I don't want to do it, but I want to reuse items I know will be accessed multiple times. Guess I'll be making my own IEnumerable extension method with caching – vrwim May 25 '16 at 13:26
  • @Yuck :While i was reading this question and i found your comment "please don't become one of those coders who tacks .ToList() on to the end of every single statement involving an IEnumerable" which relates to me because i am always doing this.so can you please tell me what do you mean by this statement and when not to do .tolist() and when to do it.please?? – I Love Stackoverflow Jun 20 '16 at 10:05

2 Answers2

6

It is pretty easy to add a List at each step

Yes, and very memory intensive. What if the data set contains 2 GB of data in total, and you have to store that in memory at once. If you iterate over it and fetch it in parts, you don't have a lot of memory pressure. When serializing 2 GB to memory you do, not to imagine what happens if every step will do the same...

You know your code and your specific use case, so only you as a developer can determine when it is useful to split off some iterations to memory. The framework can't know that.

Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325
  • What if I am constantly accessing the same 5 elements in a list of 30000? The cache would only keep the 5 elements in memory. – vrwim May 25 '16 at 13:05
  • 2
    But who knows that? It is very hard to build an efficient caching mechanism, especially when you don't know the use case of it. – Patrick Hofman May 25 '16 at 13:06
6

Because it makes no sense, and if you would think about all the cases where it makes no sense you would not ask it. This is not so much a "does it sometimes make sense" question as a "are there side effects that make it bad". Next time you evaluate something like this, think about the negatives:

  • Memory consumption goes up as you HAVE to cache the results, even if not wanted.
  • On then ext run, the results may be different as incoming data may have changed. your simplistic example (Enumerable.Range) has no issue with that - but filtering a list of customers may have them updated.

Stuff like that makes is very hard to sensibly take away the choice from the developer. Want a buffer, make one (easily). But the side effects would be bad.

TomTom
  • 61,059
  • 10
  • 88
  • 148
  • That makes sense. But why does LINQ not have a `.Cache()` option? – vrwim May 25 '16 at 13:06
  • 3
    It has, it is called `ToList()`, `ToArray()` and `ToDictionary()` @vrwim – Patrick Hofman May 25 '16 at 13:07
  • Nope, ToList enumerates the entire list, I want an option that caches the items I have accessed. What if I only ever access 5 elements in a 30000 item list? I know I could make it, but why doesn't it exist already in the framework? – vrwim May 25 '16 at 13:08
  • @vrwim What exactly would you expect a `Cache` method to do that `ToList` doesn't? If you want to cache the 5 items then it has to first iterate the entire list to find them. – juharr May 25 '16 at 13:11
  • I want it to cache the elements I have accessed, to prevent the entire `IEnumerable` from being calculated multiple times. `ToList` just calculates all items, even if I never access them. In my example these could have been the 5 first items, to point out the obvious performance improvement. – vrwim May 25 '16 at 13:22
  • I would expect it to cache *lazily*, as enumeration happens. – daniel.gindi Jun 23 '22 at 07:00