2

I have wrote 2 SQL statements.

SELECT 
customers.CustomerID,
orders.OrderID
FROM customers LEFT JOIN orders ON customers.CustomerID = 
orders.CustomerID AND orders.EmployeeID=4
WHERE orders.OrderID IS NULL;

And

SELECT 
customers.CustomerID,
orders.OrderID
FROM customers LEFT JOIN orders ON customers.CustomerID = 
orders.CustomerID AND orders.EmployeeID=4 AND orders.OrderID IS NULL;

The 1st one returns 16 entries, which is correct (per the solution). The 2nd one returns 91 entries. I am using Northwind for mysql version. I read from somewhere that placing condition in JOIN statement is equivalent to doing that in WHERE statement. However, here I can see the difference.

halfer
  • 19,824
  • 17
  • 99
  • 186
hacaoideas
  • 105
  • 8
  • 1
    See https://dba.stackexchange.com/questions/196030/filter-condition-difference-where-clause-vs-join-condition "When you have a TableA a LEFT JOIN TableB b scenario, then you have to be careful how you use TableB fields in your WHERE clause. For any rows in TableA where there is no matching row in TableB, all fields in TableB will be set to NULL." – DotNetDev Aug 06 '19 at 13:45
  • 1
    Didn't knew it could be done like that. However, I will still place conditions in WHERE, easier to read to me. I'm also interested in the answer. @DotNetDev You may consider create an answer and explain with with the link as a back-up source. Don't answer in comments. – Zoma Aug 06 '19 at 13:45
  • The answer to [this question](https://stackoverflow.com/questions/8311096/whats-the-difference-between-where-clause-and-on-clause-when-table-left-join) might help. – dvo Aug 06 '19 at 13:45
  • The second one will produce more rows. – The Impaler Aug 06 '19 at 13:47

3 Answers3

2

This is to do with the order execution on the statements and the fact you’re using a LEFT JOIN.

A LEFT JOIN will keep all the values from the left side intact, combining only values from the right side that match conditions.

A WHERE clause operates on the query as a whole (in this case).

Query 1:

  1. Get all values from left table
  2. Join on any values from the right table that match conditions
  3. Filter the joined output based on the where condition

Query 2:

  1. Get all values from the left table
  2. Join on any values from the right table that match conditions

The only time a WHERE is really analogous to a JOIN is something like a INNER JOIN, which gets only the relevant values from both left and right sides. I think at least, it has been a while since I flexed the SQL muscles.

Edit - count the number of rows (nothing else) in your customers table, it should return 91 as well.

dijksterhuis
  • 1,225
  • 11
  • 25
  • It's true, returning 91 rows. – hacaoideas Aug 06 '19 at 14:01
  • @hacaoideas counting table rows, distinct counts of columns etc are really useful debugging tools in SQL :) I try to find out some general info about a table before running any query on it. – dijksterhuis Aug 06 '19 at 14:03
  • I have a doubt. The condition of the join is ON ((customers.CustomerID = orders.CustomerID) AND (orders.EmployeeID=4) AND (orders.OrderID IS NULL)) why doesnt it join only when these 3 conditions are satisfied? – hacaoideas Aug 06 '19 at 14:05
  • @hacaoideas it does only operate when the conditions are satisfied. it will only retrieve rows that match those conditions from the right table, then join them onto the left. As it’s a `LEFT JOIN`, the left table doesn’t get modified. Trying running `RIGHT`, `INNER` and `OUTER` joins and see how the results change. – dijksterhuis Aug 06 '19 at 14:14
  • I just view the full return set of query 2 (select * from ...) and I see that all the values for EmployeeID are NULL instead of 4 as specified in the condition. – hacaoideas Aug 06 '19 at 14:27
1

The second query will return more rows. Why? Read on:

Second query

The second query will match orders rows to customers rows using the predicate:

customers.CustomerID = orders.CustomerID AND orders.EmployeeID=4 AND orders.OrderID IS NULL

Unmatched customers rows will always show up in the result.

First query

Now, the first query will match orders rows to customers rows using the predicate:

customers.CustomerID = orders.CustomerID AND orders.EmployeeID=4

Then it will filter out rows where orders.OrderID is null, removing customer rows that do not fullfil this last predicate, therefore producing less rows. Unmatched customers rows may not show up in the result.

Example:

create table customers (
  CustomerId int
);

create table orders (
  OrderId int,
  CustomerId int,
  EmployeeId int
);

insert into customers (CustomerId) values (1), (2), (3);

insert into orders (OrderId, CustomerId, EmployeeId) values
  (1001, 1, 3),
  (1002, 1, 4),
  (1003, 2, 1);

The first query returns:

CustomerId  OrderId      
----------  ----------------
2           <null>       
3           <null>      

while the second query returns:

CustomerId  OrderId      
----------  ----------------
1           <null>       
2           <null>       
3           <null>       
The Impaler
  • 45,731
  • 9
  • 39
  • 76
0

This could be because of the order in which the Sql query runs FROM and JOIN s. The FROM clause, and subsequent JOIN s are first executed then WHERE and lastly SELECT.

In second case all the join conditions ran together but in the first case the where clause made the difference after joining the data and filtering not at the same time but as a whole after joining.

Himanshu
  • 3,830
  • 2
  • 10
  • 29