2

I have a simple method like this:

private void CalculateTotals(IEnumerable<MyData> data)
{
   decimal value1 = data.Sum(d => d.Value1);
   decimal value2 = data.Sum(d => d.Value2);
}

This works and allows for any type of collection to be passed in, but this will as well perform multiple enumeration of data that can be quite costly as well.

Now I can change the type of parameter to ICollection<MyData> but that will mean that the caller does not know if the method can/will change the collection since it's possible to call Add/Remove on it.

My next guess is IReadonlyCollection<MyData> that seems most fitting, but that will mean every caller will have to create/transform to a new ReadonlyCollection.

Are there any "preferred" ways of solving this in .Net?

Ilya Chernomordik
  • 27,817
  • 27
  • 121
  • 207
  • `IReadOnlyList` or `IReadOnlyCollection`. Note you don't need to pass in a read only collection etc - any old list will do (since `List` implements both interfaces). – mjwills Apr 08 '19 at 11:38
  • Is this just an example or are you interested in that specific code? – MakePeaceGreatAgain Apr 08 '19 at 11:39
  • 2
    In cases where side-stepping the issue isn't possible or not economical (as one of the answers shows, this one can be solved with `Aggregate`), your best bet probably *is* `IReadOnlyCollection` (or `IReadOnlyList` if you need access by index rather than simply enumeration). The objection that the caller will have to potentially create a new collection is valid, but it just reflects the actual, real cost of having to have arbitrary access to the elements. It's a material difference that cannot be swept under the rug (imagine passing in an infinite `IEnumerable` to this function). – Jeroen Mostert Apr 08 '19 at 11:40
  • 1
    Why do you feel that `IReadonlyCollection` resolves the double enumeration? – Enigmativity Apr 08 '19 at 11:55
  • The question is about performance or about security against unauthorized modification of the data by the caller? – Theodor Zoulias Apr 08 '19 at 12:00
  • @HimBromBeere Just an example, I have more complicated cases than that – Ilya Chernomordik Apr 08 '19 at 12:04
  • @TheodorZoulias It's about performance (so that e.g. SQL will not run twice) – Ilya Chernomordik Apr 08 '19 at 12:04
  • @Enigmativity By enumeration I mean going through IEnumerable. If we talk about scan it would of course needs to go through collection 2 times anyway (unless we make it smarter, but I want enumeration, not to code my own loop that does all I need in one go) – Ilya Chernomordik Apr 08 '19 at 12:05
  • If it is about performance then the type of the wrapper around not-buffered data is less relevant. The important thing is to avoid enumerating more than once. You can actually enforce this, to disallow enumerating twice by mistake. See a [question](https://stackoverflow.com/questions/55565763/restricting-the-enumerations-of-linq-queries-to-one-only) I made today about this. – Theodor Zoulias Apr 08 '19 at 12:15
  • Btw if you hit the database every time you enumerate the query results, you don't have only a performance problem. You have a problem of data consistency as well. Value1 and Value2 could be calculated over a different set of data. – Theodor Zoulias Apr 08 '19 at 12:21
  • Yes, that's exactly what I want to prevent – Ilya Chernomordik Apr 08 '19 at 12:25

4 Answers4

5

My next guess is IReadonlyCollection<MyData> that seems most fitting, but that will mean every caller will have to create/transform to a new ReadonlyCollection.

Well no, because various built-in types implement that interface:

But instead you might want to prevent the double enumeration.

CodeCaster
  • 147,647
  • 23
  • 218
  • 272
  • Still even if I prevent the enumeration the caller would not know this. Should I accept `IReadonlyCollection` to make it clear that one cannot pass just an `IEnumerable` or is it the caller responsibility to decide? – Ilya Chernomordik Apr 08 '19 at 12:18
  • I don't see how these types avoid double enumeration, per se. Any of then could still be enumerated twice. – Enigmativity Apr 08 '19 at 12:30
  • @Enigmativity they don't. They just make sure that they're in memory, as opposed to an IQueryable, for example, which will cause a query to be executed twice. – CodeCaster Apr 08 '19 at 12:43
  • @CodeCaster - Ah, I see your point. I didn't get that from the answer though. – Enigmativity Apr 08 '19 at 12:44
  • You will need to iterate twice anyway if you write this type of LINQ, but the difference is "lazy" linq vs one that is already in memory. That can make a huge difference – Ilya Chernomordik Apr 08 '19 at 12:45
3

This gives you a single enumeration:

private void CalculateTotals(IEnumerable<MyData> data)
{
    (decimal value1, decimal value2) =
        data.Aggregate((0m, 0m), (a, x) => (a.Item1 + x.Value1, a.Item2 + x.Value2));
}
Enigmativity
  • 113,464
  • 11
  • 89
  • 172
  • I love elegant, modernist and readable solutions. +1 for that – ilkerkaran Apr 08 '19 at 11:38
  • 2
    While this may work in the specific usecase, it´s not a general answer to "which type of collection should I use". – MakePeaceGreatAgain Apr 08 '19 at 11:39
  • @HimBromBeere - It seemed to me that the OP asked that because he wanted a type that would allow a single enumeration. – Enigmativity Apr 08 '19 at 11:46
  • This is a lot less readable, so I hope it is a lot faster to compensate for the readability loss. – Theodor Zoulias Apr 08 '19 at 11:56
  • @TheodorZoulias - I'd be interested to see a more readable solution that avoids the double enumeration. – Enigmativity Apr 08 '19 at 11:58
  • This was just an example really, so it can be something else that does other types of enumerations that do not allow to do it this easy – Ilya Chernomordik Apr 08 '19 at 12:02
  • @TheodorZoulias: compared to enumerating twice, it's twice as fast. It can even make a difference for the result if your enumerable isn't guaranteed to give the same results every time it's enumerated. That may matter if your enumerable takes 23 minutes to enumerate because it downloads things from a server in Antarctica behind the scenes -- or not at all because you only ever call it with small arrays (but then, making it take an array explicitly is probably better). If you don't like `Aggregate`, it can always be replaced with an explicit `foreach` and some locals. – Jeroen Mostert Apr 08 '19 at 12:03
  • 1
    @Enigmativity the double enumeration should be avoided only if it's costly. If it isn't, just leave it as is. You need 3 seconds to understand the original code with the two `Sum`s, and more than 15 to understand the optimized version with the `Aggregate`. – Theodor Zoulias Apr 08 '19 at 12:04
  • @TheodorZoulias - That's not the point of this question. He's trying to avoid double enumeration. He's not making the distinction on the cost of enumerating nor the readability, just to avoid enumerating more than once. – Enigmativity Apr 08 '19 at 12:29
  • @Enigmativity you are right, it's now clear. The title of the question is a bit misleading because it implies that the double enumeration could be tolerated, and somehow alleviated by optimizing the type of the collection... – Theodor Zoulias Apr 08 '19 at 12:42
  • @TheodorZoulias - I think he's used the word "accept" when he meant "except". – Enigmativity Apr 08 '19 at 12:46
3

Generelly I would suggest to use the least greedy interface possible that fits your needs. So if your method needs to iterate, just use the most generic one which is IEnumerable<T>. If you need adding/removing use an ICollection, if you need index-based access use an IList.

Independent from what callers should provide, you should optimize the internals of your method to avoid mulitple iterations - which you usually achieve by materializing the collection in some way. Don´t make your callers care for the internals of your method. Those are implementation-details.

Best you also check if your collection allready is materialized by casting to list or array first:

var dataAsList = (data as IList<T>) ?? data.ToList();
MakePeaceGreatAgain
  • 35,491
  • 6
  • 60
  • 111
  • You mean it's a callers responsibility to ensure to not send IEnumerable that can e.g. execute a new SQL query anywhere? – Ilya Chernomordik Apr 08 '19 at 12:08
  • I can imaging there can be different needs as e.g. the one I wrote and e.g. a filtering method that will not enumerate at all. How would you know the difference? – Ilya Chernomordik Apr 08 '19 at 12:08
  • I mean it´s the callers responsibility to provide the right items, and the members resp. to handle them in a performant way. Don´t mix those two. Apart from this I can´t imagagine what you say by "filter without iterating". – MakePeaceGreatAgain Apr 08 '19 at 12:13
  • return data.Where(d => d.IsActive == true) e.g. will return IEnumerable without iterating over it – Ilya Chernomordik Apr 08 '19 at 12:15
  • But why should it make any difference to **the caller** what the method does internally? If it needs iterating the collection or not shouldn´t have any effect on the caller-site. – MakePeaceGreatAgain Apr 08 '19 at 12:18
  • @IlyaChernomordik: There is no simple, agreed-upon way of communicating the fact that a method will or will not consume an `IEnumerable` if it itself returns an `IEnumerable` (if it returns anything else, it's a safe bet that it does, because how else would it work?). While it's a material difference, that's an example of something you just have to "know" (or have guaranteed by the docs). If anything, it proves that there's more than one way to handle this problem -- "I may or may not enumerate more than once, deal with it" is feasible, if not necessarily desirable. – Jeroen Mostert Apr 08 '19 at 12:25
  • @JeroenMostert So you suggest that the caller should care and I should just accept IEnumerable? – Ilya Chernomordik Apr 08 '19 at 12:27
  • @IlyaChernomordik: no -- I'm suggesting there is no one, correct, "this is definitely the right approach in all circumstances" answer. The one thing that *is* true is that not enumerating more than once in your implementation (if possible) matters more than the type you use in your interface, and that if you do enumerate more than once your are up front about this to callers. The type used in the call can't be used to accurately convey all the ways in which your implementation might be using it; this is true no matter the type. What's left is "managing expectations". – Jeroen Mostert Apr 08 '19 at 12:29
-3

You have to sum. You have a collection. You have to enumerate.

End of the story.

So, IEnumerable works fine.

Baum mit Augen
  • 49,044
  • 25
  • 144
  • 182
Liquid Core
  • 1
  • 6
  • 27
  • 52
  • 1
    What I ment by my initial comment is this: OP wants to know how to best handle the multiplie-iterations issue. You say: you won´t need to, which surely is not a good answer. That´s why I donvoted - your answer just doesn´t solve the problem. So "downvote like crazy with no reason" is simply whrong. – MakePeaceGreatAgain Apr 08 '19 at 11:53
  • 3
    `IEnumerable` does not necessarily work fine, at least not without code changes. Some on-the-fly enumerables can't be enumerated more than once (`DataReader`), or are prohibitively expensive to enumerate more than once (think arbitrary LINQ queries). This is exactly the issue the OP has, and your answer doesn't address those concerns at all. – Jeroen Mostert Apr 08 '19 at 11:53