3

We've had a little introspection over at Code Review of LINQ Enumerable and discovered different strategies for iterating a IEnumerable<TSource> to yield us some information about it.

Any

public static bool Any<TSource>(
     this IEnumerable<TSource> source, Func<TSource, bool> predicate) 
{
     if (source == null) throw Error.ArgumentNull("source");
     if (predicate == null) throw Error.ArgumentNull("predicate");
     foreach (TSource element in source) {
         if (predicate(element)) return true;
     }
     return false;
 }

 public static bool Any<TSource>(this IEnumerable<TSource> source)
 {
      if (source == null) throw Error.ArgumentNull("source");
      using (IEnumerator<TSource> e = source.GetEnumerator()) {
          if (e.MoveNext()) return true;
      }
      return false;
  }

Count

public static int Count<TSource>(this IEnumerable<TSource> source) 
{
    if (source == null) throw Error.ArgumentNull("source");
    ICollection<TSource> collectionoft = source as ICollection<TSource>;
    if (collectionoft != null) return collectionoft.Count;
    ICollection collection = source as ICollection;
    if (collection != null) return collection.Count;
    int count = 0;
    using (IEnumerator<TSource> e = source.GetEnumerator()) {
        checked {
            while (e.MoveNext()) count++;
        }
    }
    return count;
}

SingleOrDefault

public static TSource SingleOrDefault<TSource>(
    this IEnumerable<TSource> source, Func<TSource, bool> predicate) 
{
    if (source == null) throw Error.ArgumentNull("source");
    if (predicate == null) throw Error.ArgumentNull("predicate");
    TSource result = default(TSource);
    long count = 0;
    foreach (TSource element in source) {
        if (predicate(element)) {
            result = element;
            checked { count++; }
        }
    }
    switch (count) {
        case 0: return default(TSource);
        case 1: return result;
    }
    throw Error.MoreThanOneMatch();
}

As you can see, 3 different strategies were used.

  • Any: Iterate the enumerator, with early exit
  • Count: Iterate the enumerator, unless ICollection, call Count
  • SingleOrDefault: Iterate the enumerator, without early exit

Some observations:

  • 'Any()' could have also been optimized for ICollection
  • 'SingleOrDefault' could have also been optimized to exit early
  • none of the methods care about IReadOnlyCollection, even though it also has a property 'Count'

Questions:

  • what are the pro's and contra's of each of these strategies?
  • is there a good reason why the implementation of LINQ is as it is?
  • 1
    If you pass `Any` a `predicate` you have to look at the items to determine if you have at least one that matches, so I don't see how it could be optimized for `ICollection` unless you're just talking about using a `for` loop instead. In any case this is way too opinion based for SO. – juharr Jul 26 '19 at 15:53
  • 1
    @steve16351 But it could exit with an error as soon as it sees a second match. – juharr Jul 26 '19 at 15:54
  • @juharr you are right about Any(predicate), I will also include Any(), which could be optimized for ICollection –  Jul 26 '19 at 15:57
  • The `Any` without a predicate just gets the enumerator and tries a `MoveNext` which should be fairly efficient even on an `ICollection` and then you avoid the casting. – juharr Jul 26 '19 at 15:59
  • 1
    You're looking at the .NET Framework source. .NET Core has some optimizations that .NET Framework doesn't. For example, `SingleOrDefault` with a predicate *does* short-circuit once it finds a second match. – madreflection Jul 26 '19 at 15:59
  • @juharr shouldn't Collection.Count be eager and IEnumerator.MoveNext lazy (with possible performance loss)? –  Jul 26 '19 at 16:01
  • 1
    It's not a eager vs lazy issue. Calling `GetEnumerator` will likely require creating a new object but I'm guessing any additional overhead was not significant enough to justify attempting the cast. – juharr Jul 26 '19 at 16:17

1 Answers1

1

Well, the first two methods are self-explanatory, aren't they? They are optimized in a way that they stop as soon as possible and also check if the type has a Count property to avoid the loop.

But the SingleOrDefault with predicate has indeed a strange implementation. It could stop at the 2nd matching item since then is clear that a InvalidOperationException must be thrown(Single... as opposed to First... ensures that there is at maximum 1 item). But instead it checks every item and counts the matches. The version without predicate has this optimization.

So the question is: What the heck? And it seems that is really a bug that won't be fixed since it's just reduces performance in error-case. I can't believe it.

By the way, the same bug exists in Enumerable.Single.

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • _SingleOrDefault_ was the most worrying one. Thanks for the clarification. The _Count_ check might be an arguable optimization for _Any_. –  Jul 26 '19 at 16:22
  • The fact they didn't bother checking for ICollection.Count in Any(), is probably because it wouldn't be an optimization here. –  Jul 26 '19 at 16:25
  • 1
    @dfhwze: the optimization would be so negligible that it wasn't wort it. Also consider the case that someone creates a type that implements `ICollection` in a way that calling the `Count` property will do a lot of work. Then this "optimization" would do the opposite – Tim Schmelter Jul 26 '19 at 16:28
  • @TimSchmelter Technically you could also create an implementation of `ICollection` where the `GetEnumerator` does a lot of work, but I'm guessing they thought that was much less likely. – juharr Jul 26 '19 at 17:09
  • 1
    @TimSchmelter Though that would go against the contract/general expectation that properties should only do light work and ideally return instantly. If someone writes a code with Count doing lots of work, it's their own problem of doing something very wrong. – ckuri Jul 26 '19 at 17:10
  • 1
    @ckuri maybe, but it's a fact that properties often do work and methods often don't. Microsoft should not write code that traps poorly written code unnecessarily but that makes as little trouble as necessary. – Tim Schmelter Jul 26 '19 at 17:16