8

Suppose I have some strings:

string[] strings = { "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine" };

What is the difference between:

string startsWithO = strings.First(s => s[0] == 'o');

And:

string startsWithO = strings.Where(s => s[0] == 'o').First();

Since Where() is deferred it shouldn't slow down the execution, right?

micahhoover
  • 2,101
  • 8
  • 33
  • 53
  • 1
    I'm 99% sure they turn into the exact same thing while compiled, but i'll let someone who is sure answer – Scott Selby Jan 10 '13 at 16:02
  • 2
    My own question might answer yours: http://stackoverflow.com/questions/10110013/order-of-linq-extension-methods-does-not-affect-performance – Tim Schmelter Jan 10 '13 at 16:03
  • I seem to have found a tiny difference: First() does not seem to take an index (i.e. First( (s, index) => (s[0] == 'o') && (index == 1))) – micahhoover Jan 10 '13 at 16:13
  • 1
    That's a fair point, although there's nothing to stop you implementing that `First` yourself :p – Rawling Jan 10 '13 at 16:17

3 Answers3

12

The performance penalty of using .Where(filter).First() rather than .First(filter) will usually be very small.

However, they're not the same - Where generates a new iterator that First can simply take one element of, whereas First(filter) can microoptimize by using just one loop and a direct return whenever filter matches.

So while both approaches have the same semantics and both execute the filter equally often (only as often as necessary), using First with a filter parameter doesn't need to create an intermediate iterator object and probably avoids some very simple method calls on that iterator too.

In other words, if you're executing such code millions of times, you will see a slight performance difference - but nothing huge; I would never worry about it. Whenever that tiny performance difference actually matters you're much better off just writing the (very simple) foreach-with-if statement that's equivalent and avoiding the extra calls and object allocations inherent in LINQ - but remember that this is a microoptimization you'll rarely need.

Edit: Benchmark demonstrating the effect:

This takes 0.78 seconds:

for(int i=0;i<10*1000*1000;i++)
  Enumerable.Range(0,1000).First(n=> n > 2);
GC.Collect();

But this takes 1.41 seconds:

for(int i=0;i<10*1000*1000;i++)
  Enumerable.Range(0,1000).Where(n=> n > 2).First();
GC.Collect();

Whereas plain loops are much faster (0.13 seconds):

long bla = 0;
for(int i=0;i<10*1000*1000;i++)
    for(int n=0;n<1000;n++)
        if(n > 2) { bla+=n; break; }
GC.Collect();
Console.WriteLine(bla);//avoid optimizer cheating.

Note that this benchmark only shows such extreme differences because I have a trivial filter and a very short non-matching prefix.

Based on some quick experimentation, the difference seems largely attributable to the details of which codepath gets taken. So, for array's and List<>s the first variant is actually faster, likely to do special-casing in .Where for those types that First doesn't have; for custom iterators, the second version is a tiny bit faster, as expected.

Summary:

.Where(...).First() is roughly as fast as .First(...) - don't bother choosing one or the other as an optimization. In general .First(...) is very slightly faster but in some common cases it is slower. If you really need that microoptimization then use plain loops which are faster than either.

Eamon Nerbonne
  • 47,023
  • 20
  • 101
  • 166
  • Yeah, in LINQ to SQL you'll probably see exactly the same SQL generated hence the same DB performance. You may however have a totally irrelevant&tiny difference in SQL generation performance; probably measurable, but swamped in every way by the DB query's overhead. – Eamon Nerbonne Jan 10 '13 at 16:13
  • Interesting answer @Eamon. I guess the payoff is too small to show up in Tim Schmelter's (link above) test. – micahhoover Jan 10 '13 at 16:16
  • Nice benchmarking! I wonder why @Eamon's benchmarking didn't show a difference when it is as much as a factor of 2. – micahhoover Jan 10 '13 at 16:44
  • @micahhoover: Tim's benchmarks have a much, much slower filter - `string.Contains` that probably dominates benchmark time. Also, Tim's scenario doesnt explicitly GC, and I suspect that tiny objects and GC overhead easily impact this benchmark since you may be creating quite a few when you want to detect the difference. – Eamon Nerbonne Jan 10 '13 at 16:59
1

There are no differences here.

Calling Where first returns an iterator that's not used until First starts looping over it.

If the predicate doesn't match any elements then the same exception InvalidOperationException is thrown.

The only difference is the verbosity of the code, so .First without .Where should be preferred

Sten Petrov
  • 10,943
  • 1
  • 41
  • 61
1

In the specific case, calling First and Where on a string[], the methods called are the Enumerable.Where and Enumerable.Firstextension methods.

Enumerable.Where does this:

public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) 
{
  // null checks omitted
  if (source is TSource[]) 
     return new WhereArrayIterator<TSource>((TSource[])source, predicate); 
  //the rest of the method will not execute
}

and the constructor of WhereArrayIterator does just:

public WhereArrayIterator(TSource[] source, Func<TSource, bool> predicate) {
  this.source = source; 
  this.predicate = predicate;
} 

So nothing is actually done here, except to create an iterator.

The first First method, without a predicate does this:

public static TSource First<TSource>(this IEnumerable<TSource> source) { 
  //null check
  IList<TSource> list = source as IList<TSource>;
  if (list != null) {
     //this branch is not taken as string[] does not implement IList<string>
     if (list.Count > 0) return list[0]; 
  }
  else { 
    //this is actually the WhereArrayIterator from before
    using (IEnumerator<TSource> e = source.GetEnumerator()) { 
      if (e.MoveNext()) 
        return e.Current;
    } 
  }
  throw Error.NoElements();
}

However, the second First does this

public static TSource First<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {
   //null checks
   foreach (TSource element in source) {
     if (predicate(element)) return element; 
   }
   throw Error.NoMatch();
}

Which in the case of an array, is as fast as direct linear access.
In a nutshell, this means that calling First(predicate) on an array will be somewhat faster, by a not large, but still detectable factor. This might not hold for lists, and will certainly not hold for IQueryable objects, that are a completely different story.

However, this is micro-optimization at it's worst. Unless this is done millions of times, it won't save too many seconds. Even as I know this now, I'll still use whatever is clearer to read and understand.

SWeko
  • 30,434
  • 10
  • 71
  • 106
  • Actually, according to my measurements on my machine quixotically on array's and `List<>`s specifically the `Where` then `First` solution is slightly *faster*. – Eamon Nerbonne Jan 10 '13 at 16:58
  • On my machine, for 100M iterations with arrays, `.Where.First` takes 14.5s, `.First` takes 13s, and explicit while loop takes 1.5s – SWeko Jan 10 '13 at 18:56
  • Yeah, that makes more sense - so you did 100M iterations in the outer loop - but how many in the inner loop? The `.Where.First` seemed to gain ground as the innermost loop got longer (i.e. as it took more iterations to find the first element). The crossover was around 11 elements; at 1000 elements the `First(...)` approach twice as slow. – Eamon Nerbonne Jan 11 '13 at 23:58