7

I'm trying to figure out what would be the proper convention for LINQ when I need to do something like the following

  • If there items, print them line-by-line
  • If there are no items, print "No items"

The way I would think to do it is like

if (items.Any())
{
    foreach (string item in items)
    {
        Console.WriteLine(item);
    }
}
else
{
    Console.WriteLine("No items");
}

However, that would technically violate the principle of multiple enumeration. A way to not violate that would be

bool any = false;
foreach (string item in items)
{
    any = true;
    Console.WriteLine(item);
}   
if (!any)
{
    Console.WriteLine("No items");
}

but clearly, that is less elegant.

FaizanHussainRabbani
  • 3,256
  • 3
  • 26
  • 46
user7127000
  • 3,143
  • 6
  • 24
  • 41
  • 2
    You can just `.ToList()` the IEnumerable, check Count and then foreach if Count > 0. – maccettura Feb 22 '18 at 16:33
  • 2
    @maccettura what would be the use of doing that? – Fredou Feb 22 '18 at 16:34
  • The pros are you only enumerate once, the drawback is you have committed that to memory. – maccettura Feb 22 '18 at 16:36
  • 1
    I'm ignorant and don't understand why using `.Any()` would be bad. It returns immediately if `.MoveNext()` is successful on the enumerator. Doesn't this mean it doesn't iterate the whole collection and would be fast? Is it/Why is it bad? – Equalsk Feb 22 '18 at 16:40
  • 2
    @Equalsk if the ienumerable does an actual action or is VERY costly or it's a forward only, you dont want to mess thing up – Fredou Feb 22 '18 at 16:41
  • 5
    I'd probably `.ToList()` but honestly I can't see what's objectionable with your "less elegant" solution. – Stephen Kennedy Feb 22 '18 at 16:44
  • 4
    This just looks like an attempt at optimization when there is no clear performance problem. Your first example is much more readable and maintainable compared to the second. – Erik Philips Feb 22 '18 at 16:57
  • @ErikPhilips You and I are on the same page today, +1 – maccettura Feb 22 '18 at 17:02
  • 1
    As programmers, our First goal should be to write readable/maintainable code that solves the problem. Trying to pre-optimize code is a very bad idea unless you absolutely know you're trying to write performant code because of requirements. [Otherwise, don't write overly complex code for what you *think* might help performance, because many times the compiler is smarter than you and you can actually hinder performance](https://blogs.msdn.microsoft.com/audiofool/2007/06/14/the-rules-of-code-optimization/). – Erik Philips Feb 22 '18 at 17:06
  • 3
    @ErikPhilips I read this question as focusing on correctness according to a specific guideline, not on performance. That guideline has multiple reasons for it, performance being one but not the only one. And here, the compiler isn't allowed to avoid multiple enumerations. It would be contrary to the C# language spec. So your comment about trusting the compiler to take care of it is very much misplaced here. –  Feb 22 '18 at 17:07
  • 1
    So we'll just ignore JetBrains' advice then. OK. – Stephen Kennedy Feb 22 '18 at 17:08
  • 2
    @StephenKennedy You mean the warning? Yeah I'd ignore it without a second thought. – Erik Philips Feb 22 '18 at 17:11
  • 1
    @hvd So *besides performance (cpu or memory)* please enlighten me on what the code does better, or how any other developer might think that the second example is better than the first? – Erik Philips Feb 22 '18 at 17:13
  • You want to check before iterating if the enumerable is empty. You also want to enumerate once. Those two contradict each other in my opinion. However, I guess youcould write your own extension method which calls an action/func if there are no items in it. Would that be an acceptable solution? – default Feb 22 '18 at 17:14
  • 1
    @ErikPhilips With the second code, either it prints the items, or it prints `"No items"`. With the first code, it's possible that nothing gets printed at all. –  Feb 22 '18 at 17:15
  • 1
    @juharr I don't understand why this isn't a duplicate of [this question](https://stackoverflow.com/questions/168901/howto-count-the-items-from-a-ienumerablet-without-iterating). Can you please elaborate on why this is different? Given the length of the comments this post now appears to be opinion based instead... – Equalsk Feb 22 '18 at 17:17
  • @hvd I guess with bad threading code, but then the second one could print "No Items" when there are items. Other than threading, can you provide an example via a DotNetFiddle? I am genuinely curious. – Erik Philips Feb 22 '18 at 17:18
  • 2
    @ErikPhilips If `items` is backed by a database, it's possible that someone else removes the last item after `items.Any()` returned true, but before `foreach (var item in items)` started. Is that good enough or do you really need an online sample for that? –  Feb 22 '18 at 17:23
  • @hvd Same for the second example. Those are just [Vexing Exceptions](https://blogs.msdn.microsoft.com/ericlippert/2008/09/10/vexing-exceptions/) as programmers we have to deal with. – Erik Philips Feb 22 '18 at 17:24
  • @Equalsk Honestly this seems more like a code review (but opinion based makes sense as we are dealing with how "elegant" the code is). Also that question dealt with counting an IEnumerable where as this is strictly dealing with multiple enumerations when using `Any`. This would be a closer dupe IMHO https://stackoverflow.com/questions/8240844/handling-warning-for-possible-multiple-enumeration-of-ienumerable – juharr Feb 22 '18 at 17:24
  • @ErikPhilips No, not the same for the second code. The second code will always produce self-consistent results. It's possible that the results will already have been changed by the time they get printed, but they were at one point correct. That's not the case for the first code. –  Feb 22 '18 at 17:25
  • @hvd While the code produces the same result, the out come is that I can add a record to the database before "No Items" printed in the second example.. which is similar problem. However, the problem is not because of the implementation of the `items` but the design not to instantiate the list a *single time*. That is bad programming that allows the first example to fail, it fails because prior code was designed incorrectly. – Erik Philips Feb 22 '18 at 17:29
  • 1
    @ErikPhilips Again, no. With the second code, you *never* get an inconsistency between the printing of `"No items"` and the printing of items. If `"No items"` gets printed, no items get printed. As for "the design not to instantiate the list a *single time*", that is exactly what the warning is about. Arguing that instantiating the list multiple times is bad programming is arguing that the warning is in fact correct. –  Feb 22 '18 at 17:33
  • @ErikPhilips When I posted to explain that OP's code _could_ result in 2 calls to get an enumerator, in direct response to his line about "principle of multiple enumeration" (the *stated problem*) and posted a link which explains why Resharper warns against such code, you complained about comment spam and irrelevancy. Honestly, I was _upset_ by your comment. Now there are half a dozen or more gladiatorial comments from you here, which seems rather hypocritical from somebody so anti comment-spam. I don't need a reply. – Stephen Kennedy Feb 22 '18 at 17:49
  • 1
    how to answer a question that is self answering? second one is the way to go – Fredou Feb 22 '18 at 18:24
  • 1
    @Equalsk To answer your question of how this is different, this question is asking how to figure out if there were any items in a sequence *you have already finished iterating*. The other question is asking how to figure out how many items are in a sequence *without iterating it at all*. Figuring out how many items are in a sequence iterating it *once* is slightly different than doing so without iterating it at all. – Servy Feb 22 '18 at 18:52
  • @Equalsk Because the answers are different. – NetMage Feb 22 '18 at 19:16
  • @Default or you could use the existing built-in method :) – NetMage Feb 22 '18 at 19:17
  • @Servy That made sense, thank you for explaining. – Equalsk Feb 22 '18 at 19:18

2 Answers2

4

Since we are talking LINQ, how about a very LINQ solution?

foreach (var item in items.DefaultIfEmpty("No items"))
    Console.WriteLine(item);
NetMage
  • 26,163
  • 3
  • 34
  • 55
  • i totally forgot about that extension – Fredou Feb 22 '18 at 19:09
  • DefaultIfEmpty calls MoveNext, this is no different than using Any. – Travis J Feb 22 '18 at 19:37
  • @TravisJ Think again, `DefaultIfEmpty` calls `MoveNext` only once, using `Any`/`foreach` creates two iterators and calls `MoveNext` twice. So using `DefaultIfEmpty` is the same as using just `foreach`. [Enumerable.DefaultIfEmptyIterator](https://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,1ff6169a97a478bf) – NetMage Feb 22 '18 at 19:40
  • @TravisJ No. `DefaultIfEmpty` *doesn't* iterate the source sequence twice when the resulting iterator is iterated. All of the LINQ operators take great pains to ensure that the source sequence is *never* iterated multiple times. You can see the source code of the method to see how they accomplished that, if you don't know how you'd write that method without iterating the source sequence multiple times. – Servy Feb 22 '18 at 19:49
  • @NetMage `DefaultIfEmpty` calls `MoveNext` until it returns false, not just once. But yes, it doesn't get multiple enumerators; it gets just one and implements it's intended semantics using just that one iterator. – Servy Feb 22 '18 at 19:50
  • Any doesn't iterate the entire sequence. Regardless, if the point was not to avoid calling MoveNext, then this is a grotesque exercise in micro optimization. N versus N+1 is not worth wasting time on. – Travis J Feb 22 '18 at 19:54
  • 2
    @TravisJ Indeed, `Any` *doesn't* iterate the entire sequence. But it does iterate the first item. If you then `foreach` over it later you've now iterated over the source sequence twice. This can cause all sorts of problems. Iterating the source sequence (even if only to get the first item) could cause side effects, it could be prohibitively expensive (many sequences need to do a lot of work before the first item can be retrieved), it could produce different results when iterated subsequent times, or any number of other possibilities. – Servy Feb 22 '18 at 19:59
  • For any such sequence ensuring that it's only iterating the sequence once is essential for the *correctness* of the program, not just the performance of it, and even for those where it's only about the performance, it can still be *extremely* relevant for those cases where the sequence itself does expensive work. – Servy Feb 22 '18 at 20:01
  • 1
    Apples and oranges. If that were the case, and you knew that the sequence was only generated through a complex action, then it should be guaranteed to have entries. Moreover, in this ridiculously complex enumerable situation, it would more than likely be a custom implementation, and should summarily have also protected itself from this situation by having its own enumerator cached. While this approach is certainly a valid way of doing this, it is not a grandiose improvement. – Travis J Feb 22 '18 at 20:06
  • 3
    @TravisJ The question is *specifically* asking how to solve the problem that they have without iterating the source sequence twice. Presumably that's because they know that they have a sequence that they cannot iterate multiple times. This complex situation probably *doesn't* involve any custom iterator. The most common cases involving these types of sequences are when using any LINQ query provider to interact with a database. If you feel that EF (or basically every other query provider out there) is implemented wrong for not caching every query, feel free to mention it to MS. – Servy Feb 22 '18 at 20:13
  • @TravisJ Consider if `items` is `File.ReadLines()`, `Any` will have the overhead of opening a (network?) file and reading it in. – NetMage Feb 22 '18 at 20:21
  • @Servy "only once" didn't mean one time, it meant on one iterator, but perhaps that should have just said using `DefaultIfEmpty` only creates one iterator. – NetMage Feb 22 '18 at 20:23
  • @NetMage More concerning than the cost of opening up the file is the possibility that the content of the file is changed between the first and second iteration. Then it becomes a correctness problem, not *just* a performance problem. – Servy Feb 22 '18 at 20:33
  • @Servy Couldn't that also be a problem while the second (or only) iteration is running? It's not like the files contents are cached by the iterator... – NetMage Feb 22 '18 at 20:36
  • @NetMage It'll have a lock out on the file while the reader is open, preventing any other process from changing it. But when it closes and then reopens the lock anything can have happened. – Servy Feb 22 '18 at 20:39
  • @Servy - Might want to read up: https://msdn.microsoft.com/en-us/data/hh949853.aspx ... query caching has existed for years. – Travis J Feb 22 '18 at 21:10
  • @TravisJ Query Plan Caching and Metadata Caching aren't the same as results caching, which that article says "While results caching isn't directly supported by the Entity Framework"... – NetMage Feb 22 '18 at 21:19
  • @NetMage - By default EF caches query results, in order to avoid that you would need to use the AsNoTracking flag. – Travis J Feb 22 '18 at 21:21
  • 1
    @TravisJ The context caches row objects, which I would suggest is not the same thing as query results. – NetMage Feb 22 '18 at 21:26
  • It keeps a collection of of the results while the context is open unless specifically asked not to. It will lazy load any results not originally queried for, as would be expected. In other words, it is caching the result of the original query. You can even inspect them if you want, and subsequent queries of the same original query will be faster if done on the same context while open. – Travis J Feb 22 '18 at 21:33
  • 1
    @TravisJ And as NetMage already just told you, the fact that it caches the *objects* doesn't mean it's caching the query or its results. It's *not* caching the results of the query. Each time you iterate the source sequence it's executing a query against the database and getting the updated results. It may be re-using some of the objects in those results, rather than re-creating them, but it's still executing the query each time. – Servy Feb 22 '18 at 21:38
-1

This may or may not be a problem, depending on your use case, because Any() will short-circuit as soon as the condition is met, meaning the entire IEnumerable doesn't need to be enumerated.

Note the comments below which point out potential pitfalls such as the implementation being forward only or expensive.

Here's the reference source:

public static bool Any<TSource>(this IEnumerable<TSource> source) {
    if (source == null) throw Error.ArgumentNull("source");
    using (IEnumerator<TSource> e = source.GetEnumerator()) {
        if (e.MoveNext()) return true;
    }
    return false;
}
Owen Pauling
  • 11,349
  • 20
  • 53
  • 64
  • 2
    what if the implementation is a forward only and does an action or is expensive? doing .Any() could be problematic – Fredou Feb 22 '18 at 16:49
  • @Fredou then don't use .Any() in that use case – Owen Pauling Feb 22 '18 at 16:50
  • Won't it enumerate a second time when the `foreach` is entered? – Stephen Kennedy Feb 22 '18 at 16:52
  • 1
    @StephenKennedy of course it will. But the first enumeration is only the first item. – Owen Pauling Feb 22 '18 at 16:53
  • 1
    And Resharper warns against that because you don't know how expensive that second `GetEnumerator` call (where the sequence is not empty) will be https://resharper-support.jetbrains.com/hc/en-us/community/posts/206012289-what-does-Possible-multiple-enumerations-of-IEnumerable-mean- – Stephen Kennedy Feb 22 '18 at 16:58
  • 2
    This definitely is really a problem. Not only because even `.Any()` could be expensive, but because enumerating twice may produce different results. –  Feb 22 '18 at 16:59
  • 3
    Too many *what if's* with no relevancy to the actual question. There is no stated performance problem. There is no stated custom list type. Way too much comment spam. – Erik Philips Feb 22 '18 at 17:00
  • Added a warning to the answer in response to comments. – Owen Pauling Feb 22 '18 at 17:02
  • 1
    The question is *specifically* asking how to avoid iterating the sequence twice. Saying, *don't bother* is just wrong, especially because it's so often important. The question isn't asking if it's a problem to iterate a sequence twice, or how expensive `Any` is. It's *stating* that they have a requirement not to iterate a sequence twice (again, a common enough requirement to have) and asking how to achieve that. Saying you don't care about a requirement they have isn't an answer. – Servy Feb 22 '18 at 19:29