1

If I chain clauses such as

var results = elements
    .Where(n => n > 3)
    .Where(n => n % 2 == 0);

is this slower than just

var results = elements.Where(n => n > 3 && n % 2 == 0);

Explain why or why not?

EDIT: It seems that the consensus is that even POCO objects iterate twice. If this is the case can someone explain why Microsoft wouldn't combine these predicates. I stumbled across Enumerable.CombinePredicates that I thought did this. Can someone please explain what this does then.

happygilmore
  • 3,008
  • 4
  • 23
  • 37

2 Answers2

2

If you're talking LINQ-to-objects, each Where involves setting up a new iterator state machine, which is expensive, so yes, it's slower than putting both conditions together.

If you're talking about LINQ-to-something else, well, it depends; an extra Where might just involve an extra string concatenation somewhere. It's still likely to be more expensive, but the exect difference depends on the LINQ provider.

Rawling
  • 49,248
  • 7
  • 89
  • 127
2

Edit: I looked a little closer. The WhereEnumerableIterator returned by the Where extension method actually overrides the Where method and combines the predicates into a single callback.

public override IEnumerable<TSource> Where(Func<TSource, bool> predicate) {
    return new Enumerable.WhereEnumerableIterator<TSource>(
        this.source, 
        Enumerable.CombinePredicates<TSource>(this.predicate, predicate));
}

private static Func<TSource, bool> CombinePredicates<TSource>(
    Func<TSource, bool> predicate1, Func<TSource, bool> predicate2
    ) {
    return (TSource x) => predicate1(x) && predicate2(x);
}

So, the speed difference I saw on my machine should probably be attributed to something else.

The first example will loop over the elements collection once to find items that satisfy the condition item > 3, and again to find items that satisfy the condition item % 2 == 0.

The second example will loop over the elements collection once to find items that satisfy the condition item > 3 && item % 2 == 0.

In the examples provided, the second will most likely always be faster than the first, because it only loops over elements once.

Here is an example of some pretty consistent results I get on my machine (.NET 3.5):

    var stopwatch = new System.Diagnostics.Stopwatch();
    var elements = Enumerable.Range(1, 100000000);
    var results = default(List<int>);
    stopwatch.Start();
    results = elements.Where(n => n > 3).Where(n => n % 2 == 0).ToList();
    stopwatch.Stop();
    Console.WriteLine(stopwatch.Elapsed);
    stopwatch.Reset();
    stopwatch.Start();
    results = elements.Where(n => n > 3 && n % 2 == 0).ToList();
    stopwatch.Stop();
    Console.WriteLine(stopwatch.Elapsed);
    Console.WriteLine("Done");
    Console.ReadLine();

Results:

00:00:03.0932811
00:00:02.3854886
Done

EDIT:
@Rawling is right in that my explanation only applies to LINQ as used on collections of POCO objects. When used as an interface to LINQ-to-SQL, NHibernate, EF, etc., your results will be more implementation-dependent.

qxn
  • 17,162
  • 3
  • 49
  • 72
  • The first example doesn't loop over `elements` twice; it loops over it once, but passes values through two iterators rather than the single iterator in the second example. – Rawling Dec 05 '12 at 15:00
  • @Rawling -- Won't the "MoveNext" method be called on all iterators of the first collection before any iterators of the second collection? – qxn Dec 05 '12 at 15:10
  • The first `Where` calls a `foreach` against the input collection, but the second `Where` calls a `foreach` against the result of the first `Where`. The input collection only gets iterated once, but there are other iterators that are set up and used at the same time. – Rawling Dec 05 '12 at 15:23
  • There are still more delegate invocations so there is still an O(N) speed difference. – usr Dec 05 '12 at 15:57