Does the compiler concatenate LINQ where queries?

Question

Consider the two following similar code samples.

One where clause.

bool validFactory
  = fields
    .Where(
      fields => field.FieldType == typeof( DependencyPropertyFactory<T> ) &&
                field.IsStatic )
    .Any();

Two where clauses.

bool validFactory
  = fields
    .Where( field => field.FieldType == typeof( DependencyPropertyFactory<T> ) )
    .Where( field => field.IsStatic )
    .Any();

I prefer the second since I find it more readable and it causes less formatting issues, especially when using auto-formatting. It is also clearer when placing comments next to the separate conditions (or even above) to clarify the intent.

My intuition says the second code sample would be less efficient. I could of course write a simple test myself (and will if nobody knows the answer). For now I thought this is perfect food for SO. ;p

Is one more efficient than the other?
Is the compiler smart enough to optimize this?

Seems a duplicate, but since I can't figure out how to search for the earlier question(s) either... — BoltClock, Sep 11 '11 at 23:18
Highly related: http://stackoverflow.com/questions/6359980/proper-linq-where-clauses — Steven Jeuris, Sep 12 '11 at 07:31

score 12 · Accepted Answer · edited Sep 12 '11 at 15:07

12

The compiler does not attempt to optimize successive "where" calls. The runtime library does. If you have a whole bunch of "where" and "select" calls beside each other, the runtime will attempt to reorganize them into a more efficient form.

In some unusual cases, of course the "optimization" turns out to make things worse. I seem to recall that Jon Skeet wrote an article about that recently, though I'm not sure where it is.

edited Sep 12 '11 at 15:07

Gabe

84,912
12
139
238

answered Sep 12 '11 at 02:23

Eric Lippert

647,829
179
1,238
2,067

When you talk about the *runtime*, are you talking about an ORM like EF or L2S, or the CLR? Because I'd expect that they'd result in the same SQL query being generated, but in LINQ-to-Objects I'd expect that no optimizations would occur. – Gabe Sep 12 '11 at 02:34
4

Eric, I think you are referring to this blog post. http://msmvps.com/blogs/jon_skeet/archive/2011/06/16/linq-to-objects-and-the-performance-of-nested-quot-where-quot-calls.aspx – SolutionYogi Sep 12 '11 at 02:58
1

@Gabe: I'm talking about LINQ-to-Objects. Your expectation that no optimizations would occur does not match reality, so my advice to you is to adjust your expectations. – Eric Lippert Sep 12 '11 at 05:22
Are we just arguing over the definition of the word "runtime"? To me "runtime" means CLR, which as far as I know has no hand in it. The optimization happens when the compiler selects `Enumerable.WhereEnumerableIterator.Where` as the second `Where` function instead of `Enumerable.Where`. I'd say that either the library or the compiler (maybe both) are responsible for the optimization rather than the runtime. – Gabe Sep 12 '11 at 06:12
@Gabe: To me "the runtime" means *any* component that does its work at runtime. In this case it is the library that has the optimization smarts in it. – Eric Lippert Sep 12 '11 at 13:45

Raymond Chen · Answer 2 · 2011-09-12T03:17:00.157

4

The compiler is not allowed to optimize this because it doesn't know what Where() does. For example you may have overloaded Where() with a version which logs its results. (The jitter could do optimization, but in practice it is unlikely to.)

The efficiency difference is unlikely to be significant. You can profile your application to see if it matters.

Update: Apparently the jitter does perform optimization here. See Eric Lippert's answer.

edited Sep 12 '11 at 03:17

answered Sep 11 '11 at 23:43

Raymond Chen

44,448
11
96
135

5

The jitter doesn't do the optimization; the implementation of Where in the LINQ-to-Objects library performs the optimization; it knows when its argument is another Where or Select query. – Eric Lippert Sep 12 '11 at 05:24
The Jitter isn't the thing doing the optimisation here - it's the code behind the WHERE statement. The IEnumerable that is returned is technically lazilly evaluated, so at the point where you ask for the first element, both of the WHERE clauses have been made, and the inner one can combine with the outer one lazilly. This is different to any cleverness of the jitter – SecurityMatt Mar 11 '12 at 16:59
Also in the case where you're using C#4.0's 'select', 'where' and 'from' keywords (which internally translate to these calls), the compiler is allowed to know the semantics of where to combine the calls, although I'm pretty sure Visual Studio one doesn't currently perform this optimisation. – SecurityMatt Mar 11 '12 at 17:00

score 1 · Answer 3 · answered Sep 12 '11 at 00:02

I wouldn't expect a significant difference here. However if you forced enumeration of the collection then I would expect more difference

bool validFactory   = fields
    .Where( field => field.FieldType == typeof( DependencyPropertyFactory<T> ) )
    .ToList()
    .Where( field => field.IsStatic )     
    .ToList()
    .Any();

In your two original code samples I see identical execution - the first item is checked for FieldType, then checked for IsStatic, and if it exists then return true. Else the second item is checked, and so on. The entire set does not need to be parsed.

In the sample above the entire set will be parsed for FieldType independently of the IsStatic check. This it's likely to be less efficient. Note that this isn't necessary in either of your snippets.

Does the compiler concatenate LINQ where queries?

3 Answers3