275

Is there any difference (performance, best-practice, etc...) between putting a condition in the JOIN clause vs. the WHERE clause?

For example...

-- Condition in JOIN
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD 
ON CUS.CustomerID = ORD.CustomerID
AND CUS.FirstName = 'John'

-- Condition in WHERE
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD 
ON CUS.CustomerID = ORD.CustomerID
WHERE CUS.FirstName = 'John'

Which do you prefer (and perhaps why)?

Steve Dignan
  • 8,230
  • 6
  • 30
  • 34
  • 6
    Did you run the two queries? Did you check the execution plans generated by the two queries? What did you observe? – S.Lott Jun 19 '09 at 16:51
  • 30
    @S.Lott, this query is for example purposes only. I'm just wondering "in general" which is the preferred method -- if any. – Steve Dignan Jun 19 '09 at 16:54
  • 1
    @Steve Dignan: You should benchmark this with sample data and look at the query plans. The answer will be very, very clear. And -- bonus -- you'll have a piece of code you can reuse when more complex situations arise. – S.Lott Jun 19 '09 at 17:01
  • 2
    I would personally put the condition in the JOIN clause if the condition describes the relation. Generic conditions that just filter the result set would go to the WHERE part then. E.g. `FROM Orders JOIN OrderParties ON Orders.Id = OrderParties.Order AND OrderParties.Type = 'Recipient' WHERE Orders.Status = 'Canceled'` – Glutexo May 02 '16 at 14:02
  • 1
    The question and solutions pertain specifically to INNER JOINs. If the join is a LEFT/RIGHT/FULL OUTER JOIN, then it is not a matter of preference or performance, but one of correct results. The *SQL Cookbook* (§ *11.3. Incorporating OR Logic when Using Outer Joins*) demonstrates the difference between the join and where conditions. – Clint Pachl Aug 29 '20 at 22:13

10 Answers10

199

The relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearrranged by the optimizer so that they may already be excluded during the JOIN process.

I recommend you write the queries in the most readable way possible.

Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.

For example, instead of:

SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
    ON ca.CustomerID = c.CustomerID
    AND c.State = 'NY'
INNER JOIN Accounts a
    ON ca.AccountID = a.AccountID
    AND a.Status = 1

Write:

SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
    ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
    ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
    AND a.Status = 1

But it depends, of course.

Brien Foss
  • 3,336
  • 3
  • 21
  • 31
Cade Roux
  • 88,164
  • 40
  • 182
  • 265
  • 17
    It's not only about clean query or readability, it's about performance. putting conditions in join improve performance for large amount of data with properly indexed tables. – Shahdat Oct 06 '16 at 15:13
  • @Shahdat never seen an execution plan where it made any difference on equivalent conditions in where or join – Cade Roux Oct 06 '16 at 16:03
  • 2
    I just run monthly sales reports joining 5-6 tables on few millions records. Perf improves by 30% - sql server 2012 – Shahdat Oct 06 '16 at 16:33
  • 4
    @Shahdat if you are getting that significant a performance difference moving your filter conditions from the where clause to the inner join you need to post those execution plans. – Cade Roux Oct 06 '16 at 16:36
  • 7
    @Cade I have investigated the execution plans - both scenarios showing same cost. I run the queries multiple times seems both taking same about of time. Previously, I was running the queries on production and got significant performance difference because database was being used by live users. Sorry for that confusion. – Shahdat Oct 18 '16 at 16:22
  • 13
    This answer is right for INNER JOINs but not for left/right joins. – yakya May 10 '17 at 12:43
  • 1
    @sotn yes, this question is only about the lack of any performance difference on 'INNER JOIN' vs 'WHERE'. – Cade Roux May 10 '17 at 14:14
  • Sometimes adding a search criteria in JOIN clause instead of in WHERE clause matters. For example you can see the answer of this question where the poster talks about how a condition put in the WHERE clause logically converts the written LEFT JOIN into an INNER JOIN.: https://dba.stackexchange.com/questions/143090/mysql-left-join-not-working-as-expected – user1451111 Apr 08 '18 at 07:58
  • @CadeRoux the question poster does not talk specifically about the INNER JOINS rather he asked in general about JOINS. He might have meant to say all joins (inner, left, right) – user1451111 Apr 08 '18 at 08:00
  • i thik thats the Engine optimisation only. but what if you play with all boolean operators!? – tdjprog Apr 08 '18 at 21:32
  • @user1451111 that is correct. The OP only gave example of INNER JOIN and that was only what was being discussed. In fact, the LEFT JOIN nullification is a good example of the kind of transform which makes the INNER JOIN and WHERE completely equivalent. – Cade Roux Apr 09 '18 at 12:53
  • I see different explains in postgresql doing one and each other form. A strategy named join filter... – Luciano Andress Martini Oct 10 '18 at 15:39
  • @LucianoAndressMartini I think you should post the example as an answer to this question, as I'm sure people would be interested in whether that is just a difference in nomenclature or an actual change in the performance and execution plan. – Cade Roux Oct 10 '18 at 19:07
  • @sotn How is the performance difference for LEFT/RIGHT JOIN ? – Grace Jul 04 '22 at 09:26
175

For inner joins I have not really noticed a difference (but as with all performance tuning, you need to check against your database under your conditions).

However where you put the condition makes a huge difference if you are using left or right joins. For instance consider these two queries:

SELECT *
FROM dbo.Customers AS CUS 
LEFT JOIN dbo.Orders AS ORD 
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderDate >'20090515'

SELECT *
FROM dbo.Customers AS CUS 
LEFT JOIN dbo.Orders AS ORD 
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'

The first will give you only those records that have an order dated later than May 15, 2009 thus converting the left join to an inner join.

The second will give those records plus any customers with no orders. The results set is very different depending on where you put the condition. (Select * is for example purposes only, of course you should not use this in production code.)

The exception to this is when you want to see only the records in one table but not the other. Then you use the where clause for the condition not the join.

SELECT *
FROM dbo.Customers AS CUS 
LEFT JOIN dbo.Orders AS ORD 
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderID is null
cryanbhu
  • 4,780
  • 6
  • 29
  • 47
HLGEM
  • 94,695
  • 15
  • 113
  • 186
  • 4
    "thus converting the left join to an inner join". How? Can you elaborate a bit? – user1451111 Apr 16 '18 at 14:57
  • @user1451111 Learn what LEFT/RIGHT JOIN returns: INNER JOIN rows plus unmatched left/right table rows extended by NULLs. FULL JOIN returns INNER JOIN rows UNION ALL unmatched left & right table rows extended by NULLs. Always know what INNER JOIN you want as part of an OUTER JOIN. A WHERE or ON that requires a possibly NULL-extended column to be not NULL after an OUTER JOIN ON removes any rows extended by NULLs, ie leaves only INNER JOIN rows, ie "turns an OUTER JOIN into an INNER JOIN". – philipxy Sep 21 '18 at 22:07
  • 3
    @user1451111 or, in simpler terms: `A left join B` is every row from A joined to every matching row from B. If B has no row that matches, then the A columns have a value but every column from B on that row shows as NULL values. If you have written `where B.somecolumn = ‘somevalue’` then you have a NULL (B.somecolumn) being compared with ‘somevalue’ . Anything compared with NULL is false, so all your rows where there is no matching B row for the A row, are eliminated, and the results you get are the same as an INNER JOIN would give, hence the outer join has become an inner one – Caius Jard Sep 22 '18 at 05:43
  • yes I have checked results are same for : SELECT funds.id, prospects.id FROM `funds` inner join prospects on (prospects.id = funds.lead_id and prospects.is_manual='no') and SELECT funds.id, prospects.id FROM `funds` left join prospects on (prospects.id = funds.lead_id) where prospects.is_manual='no' – Rohit Dhiman May 24 '19 at 10:15
  • I think the question was about situations when both cases are semantically identical (produce the same results), but performance may differ. If two cases give different results, one of them is obviously incorrect, there is nothing to discuss. – C-F Jun 24 '19 at 23:50
31

Most RDBMS products will optimize both queries identically. In "SQL Performance Tuning" by Peter Gulutzan and Trudy Pelzer, they tested multiple brands of RDBMS and found no performance difference.

I prefer to keep join conditions separate from query restriction conditions.

If you're using OUTER JOIN sometimes it's necessary to put conditions in the join clause.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 2
    I agree with you that syntactically it's cleaner, and I have to defer to your knowledge of that book and your very high reputation, but I can think of 4 queries in the last week with very different execution plans, CPU times, and logical reads when I moved where predicates to the join. – marr75 Jun 19 '09 at 16:58
  • 2
    You were asking about best practices. As soon as you get into testing how a specific RDBMS implementation works, other folks have given the correct advice: benchmark. – Bill Karwin Jun 19 '09 at 17:09
15

WHERE will filter after the JOIN has occurred.

Filter on the JOIN to prevent rows from being added during the JOIN process.

TheTXI
  • 37,429
  • 10
  • 86
  • 110
  • 11
    Semantically, they are prevented during the INNER JOIN process, but the optimizer can rearrange INNER JOIN and WHERE predicates at will, so the optimizer is free to exclude them later if it wishes. – Cade Roux Jun 19 '09 at 17:02
  • 1
    Cade Roux: Right. Often times what you write in SQL isn't what the optimizer will give you when all is said and done. I would suppose then that this would be right in an all-theory world, while your answer is of course more correct in the world of automatic query optimizers :) – TheTXI Jun 19 '09 at 17:09
  • I like this explanation of the condition in the `ON` – Robert Aug 30 '18 at 12:23
3

I prefer the JOIN to join full tables/Views and then use the WHERE To introduce the predicate of the resulting set.

It feels syntactically cleaner.

Johnno Nolan
  • 29,228
  • 19
  • 111
  • 160
2

I typically see performance increases when filtering on the join. Especially if you can join on indexed columns for both tables. You should be able to cut down on logical reads with most queries doing this too, which is, in a high volume environment, a much better performance indicator than execution time.

I'm always mildly amused when someone shows their SQL benchmarking and they've executed both versions of a sproc 50,000 times at midnight on the dev server and compare the average times.

marr75
  • 5,666
  • 1
  • 27
  • 41
2

Agree with 2nd most vote answer that it will make big difference when using LEFT JOIN or RIGHT JOIN. Actually, the two statements below are equivalent. So you can see that AND clause is doing a filter before JOIN while the WHERE clause is doing a filter after JOIN.

SELECT *
FROM dbo.Customers AS CUS 
LEFT JOIN dbo.Orders AS ORD 
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'

SELECT *
FROM dbo.Customers AS CUS 
LEFT JOIN (SELECT * FROM dbo.Orders WHERE OrderDate >'20090515') AS ORD 
ON CUS.CustomerID = ORD.CustomerID
user3512680
  • 407
  • 4
  • 5
  • I also tested the equivalency of the joins as described in this post. An example is posted at github gists [https://gist.github.com/northernocean/a70ac96e459987e77c99660c4a650218](https://gist.github.com/northernocean/a70ac96e459987e77c99660c4a650218) – topsail Nov 05 '21 at 15:57
0

Joins are quicker in my opinion when you have a larger table. It really isn't that much of a difference though especially if you are dealing with a rather smaller table. When I first learned about joins, i was told that conditions in joins are just like where clause conditions and that i could use them interchangeably if the where clause was specific about which table to do the condition on.

Eric
  • 7,930
  • 17
  • 96
  • 128
-1

Putting the condition in the join seems "semantically wrong" to me, as that's not what JOINs are "for". But that's very qualitative.

Additional problem: if you decide to switch from an inner join to, say, a right join, having the condition be inside the JOIN could lead to unexpected results.

Jacob B
  • 2,005
  • 13
  • 12
  • 3
    Sometimes these results are kinda "expected" and sometimes even "intentional" (for example with outer joins, where WHERE condition has different semantics than JOIN condition). – Marcel Toth May 30 '12 at 13:30
-7

It is better to add the condition in the Join. Performance is more important than readability. For large datasets, it matters.