Proper LINQ where clauses

Question

I write a fair amount of LINQ in my day to day life, but mostly simple statements. I have noticed that when using where clauses, there are many ways to write them and each have the same results as far as I can tell. For example:

from x in Collection
  where x.Age == 10
  where x.Name == "Fido"
  where x.Fat == true
  select x;

appears to be equivalent to this at least as far as the results are concerned:

from x in Collection
  where x.Age == 10 &&
        x.Name == "Fido" &&
        x.Fat == true
  select x;

So is there really a difference other than syntax? If so, what is the preferred style and why?

Jon Skeet · Answer 1 · 2015-04-27T16:23:35.990

EDIT: LINQ to Objects doesn't behave how I'd expected it to. You may well be interested in the blog post I've just written about this...

They're different in terms of what will be called - the first is equivalent to:

Collection.Where(x => x.Age == 10)
          .Where(x => x.Name == "Fido")
          .Where(x => x.Fat == true)

wheras the latter is equivalent to:

Collection.Where(x => x.Age == 10 && 
                      x.Name == "Fido" &&
                      x.Fat == true)

Now what difference that actually makes depends on the implementation of Where being called. If it's a SQL-based provider, I'd expect the two to end up creating the same SQL. If it's in LINQ to Objects, the second will have fewer levels of indirection (there'll be just two iterators involved instead of four). Whether those levels of indirection are significant in terms of speed is a different matter.

Typically I would use several where clauses if they feel like they're representing significantly different conditions (e.g. one is to do with one part of an object, and one is completely separate) and one where clause when various conditions are closely related (e.g. a particular value is greater than a minimum and less than a maximum). Basically it's worth considering readability before any slight performance difference.

@JonSkeet Maybe I'm wrong, but after a quick review of Linq Where implementation, I'm not sure of that. Nested Where are combined by a static method 'CombinePredicates'. Collection is iterated only once by a single iterator with the combined predicate. Of course, there is a performance impact of combining func, but it's very limited. Are you OK ? — Cybermaxs, Apr 04 '13 at 11:54
@Cybermaxs: Not sure of *what*, precisely? I never suggested that the collection would be iterated over more than once. — Jon Skeet, Apr 04 '13 at 11:57
@JonSkeet yes of course but at the end all predicate are combined and only one iterator is involed. Look atEnumerable.WhereSelectEnumerableIterator. — Cybermaxs, Apr 04 '13 at 12:46

score 79 · Accepted Answer · answered Jun 15 '11 at 15:13

The second one would be more efficient as it just has one predicate to evaluate against each item in the collection where as in the first one, it's applying the first predicate to all items first and the result (which is narrowed down at this point) is used for the second predicate and so on. The results get narrowed down every pass but still it involves multiple passes.

Also the chaining (first method) will work only if you are ANDing your predicates. Something like this x.Age == 10 || x.Fat == true will not work with your first method.

Chain ORing conditions is somewhat possible using this extension: http://www.albahari.com/nutshell/predicatebuilder.aspx — jahu, May 16 '14 at 12:58

score 15 · Answer 3 · answered Jun 15 '11 at 15:23

when i run

from c in Customers
where c.CustomerID == 1
where c.CustomerID == 2
where c.CustomerID == 3
select c

and

from c in Customers
where c.CustomerID == 1 &&
c.CustomerID == 2 &&
c.CustomerID == 3
select c customer table in linqpad

against my Customer table it output the same sql query

-- Region Parameters
DECLARE @p0 Int = 1
DECLARE @p1 Int = 2
DECLARE @p2 Int = 3
-- EndRegion
SELECT [t0].[CustomerID], [t0].[CustomerName]
FROM [Customers] AS [t0]
WHERE ([t0].[CustomerID] = @p0) AND ([t0].[CustomerID] = @p1) AND ([t0].[CustomerID] = @p2)

so in translation to sql there is no difference and you already have seen in other answers how they will be converted to lambda expressions

ok,then you want to say that it will not have any performance effect if i use any of these? — Bimal Das, May 27 '17 at 05:18
WHERE clauses are chained in fact. So, it doesn't matter how you write it. There is no performance difference. — hastrb, Aug 18 '17 at 10:39

user7116 · Answer 4 · 2011-06-15T15:19:25.010

14

The first one will be implemented:

Collection.Where(x => x.Age == 10)
          .Where(x => x.Name == "Fido") // applied to the result of the previous
          .Where(x => x.Fat == true)    // applied to the result of the previous

As opposed to the much simpler (and ~~far faster~~presumably faster):

// all in one fell swoop
Collection.Where(x => x.Age == 10 && x.Name == "Fido" && x.Fat == true)

edited Jun 15 '11 at 15:19

answered Jun 15 '11 at 15:13

user7116

63,008
17
141
172

8

"Far faster"? We don't even know which LINQ implementation is involved yet, so it's hard to attach any performance implication to it. – Jon Skeet Jun 15 '11 at 15:15
In the general case the latter only requires 1 loop. A provider could choose to flatten the first example, but it is not required. – user7116 Jun 15 '11 at 15:17
2

Indeed... but you're claiming the latter *is* far faster. It's not at all clear that it will be *significantly* faster at all - after all, the significance of the performance difference will depend on how this is being used. – Jon Skeet Jun 15 '11 at 15:19
1

@Jon: no disagreement. As you note the reality could be the LINQ provider goes and does useful optimization transformations to the expression. But given the second one requires only one loop and benefits from boolean short circuiting, it is hard to see why it shouldn't be labeled as "far faster" in general terms. If the OP only has 5 elements my point is moot. – user7116 Jun 15 '11 at 15:23
1

It seems they will result in the same expression https://github.com/microsoft/referencesource/blob/5697c29004a34d80acdaf5742d7e699022c64ecd/System.Core/System/Linq/Enumerable.cs#L61 as they will be combined – Lee Mar 20 '22 at 10:45
For Linq to Objects, the Big 0 notation should be O(N) for the combined statements in a single Where() and O(3N) for the multiple Where() calls. But the latter should have the constant removed and be reduced down to O(N) as well, so the two should theoretically be largely equal in performance. That's assuming that LINQ goes through the entire collection once for each Where(). – todji Nov 07 '22 at 15:14

score 3 · Answer 5 · answered Jun 15 '11 at 15:16

Looking under the hood, the two statements will be transformed into different query representations. Depending on the QueryProvider of Collection, this might be optimized away or not.

When this is a linq-to-object call, multiple where clauses will lead to a chain of IEnumerables that read from each other. Using the single-clause form will help performance here.

When the underlying provider translates it into a SQL statement, the chances are good that both variants will create the same statement.

Proper LINQ where clauses

5 Answers5

Linked

Related