3

I'm refactoring some sql at work, and stumbled on something I'm not sure how to explain. There are two queries I thought would result in the same result, but don't, and I'm not sure why.

The queries are as follows:

select *
from TableA as a
left join TableB b on a.id = b.id and b.status in (10, 100)

select *
from TableA as a
left join TableB b on a.id = b.id
where b.status is null or b.status in (10, 100)

When will these not return the same result?

Salman A
  • 262,204
  • 82
  • 430
  • 521
The Oddler
  • 6,314
  • 7
  • 51
  • 94
  • Since you have that `is null` condition in 2;nd query, they will return the same result. – jarlh Sep 11 '18 at 08:58
  • @jarlh I thought so too, but they don't. – The Oddler Sep 11 '18 at 09:00
  • try `where b.id is null or b.status in (10, 100)` in the second query, is that the same as the first? – Paul Maxwell Sep 11 '18 at 09:01
  • Oops, now I see. You should have `where b.id is null ...` instead. – jarlh Sep 11 '18 at 09:12
  • Your question mostly boils down to filtering the `status` in the `ON` clause or filtering in the `WHERE` clause. Obviously, in the latter case, you'll tend to lose more records. The `IS NULL` check isn't doing anything you intend for it to do; but it would also allow matching records which happen to have a `NULL` status to pass through to the result set. – Tim Biegeleisen Sep 11 '18 at 09:12

7 Answers7

6

The big difference with the Where condition b.status is null or b.status in (10, 100) is when b.status is say 1 as well as b.id=a.id

In the first query you will still get the row from table A with corresponding B part as NULL as On condition is not fully satisfied. In the second query you will get the row in the JOIN for both a and b tables which will be lost in the where clause.

DhruvJoshi
  • 17,041
  • 6
  • 41
  • 60
  • This makes a lot of sense! And I think conceptually the second one makes more sense for me then, but I'll have to check with some collegues. Thanks! – The Oddler Sep 11 '18 at 09:05
6

Let me show an example:

SELECT * INTO #A FROM (VALUES 
(1),(2),(3),(4)) T(id)

SELECT * INTO #B FROM (VALUES
(1,NULL),
(2,1),
(3,10)) T(id,status)

select *
from #A as a
left join #B b on a.id = b.id and b.status in (10, 100)

select *
from #A as a
left join #B b on a.id = b.id
where b.status is null or b.status in (10, 100)

Result

id          id          status
----------- ----------- -----------
1           NULL        NULL
2           NULL        NULL
3           3           10
4           NULL        NULL

id          id          status
----------- ----------- -----------
1           1           NULL
3           3           10
4           NULL        NULL

Final response:

  1. If status is not in (10,100), LEFT JOIN applies NULL
  2. If status is NULL, LEFT JOIN also applies NULL, predicate is not valid
Paweł Dyl
  • 8,888
  • 1
  • 11
  • 27
3

Logically, speaking, your second query is almost the same as:

SELECT *
FROM TableA as a
LEFT JOIN TableB b
    ON a.id = b.id
WHERE b.status IN (10, 100);  -- b.status is null has been removed

So then the problem comes down to the standard problem of filtering in the ON clause versus filtering in the WHERE clause. In the former case, all records from the left side of the join will be retained, even if the ON logic should fail. In the latter case, which is the case of your second query, matching records which fail the status condition would be removed, and not show up in the result set.

I said almost the same, because the b.status IS NULL check you had would in fact allow records to survive which did match in the join condition, but which happened to have a null value for the status. But, other than this, your question is really just one of filtering in the ON clause versus doing it in the WHERE clause.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
2
select * from TableA as a left join TableB b on a.id = b.id and b.status in (10, 100)

The conditional statement AND is evaluated before the join occurs.

select * from TableA as a left join TableB b on a.id = b.id
  where b.status is null or b.status in (10, 100)

The filter happens after the tables are joined.

SO this is the reason you will get different output.

Artem
  • 3,304
  • 3
  • 18
  • 41
Jignasha
  • 21
  • 1
2

Since it is a LEFT JOIN, a non-matching ON condition will generate a row where all columns from right table contain NULL.

On the other hand, a non-matching WHERE clause will eliminate the row completely regardless of join type. Consider this example:

CREATE TABLE #TableA(id INT);
INSERT INTO  #TableA VALUES
    (1),
    (2);
CREATE TABLE #TableB(id INT, status INT);
INSERT INTO  #TableB VALUES
    (1, 10),
    (2, -1);

SELECT *
FROM #TableA AS A
LEFT JOIN #TableB B ON A.id = B.id AND B.status IN (10)
/*
    a.id | b.id | status
    1    | 1    | 10
    2    | NULL | NULL
*/    

SELECT *
FROM #TableA AS A
LEFT JOIN #TableB B ON A.id = B.id
-- WHERE B.status IS NULL OR B.status IN (10)
/*
    a.id | b.id | status
    1    | 1    | 10
    2    | 2    | -1
*/

Notice that I have commented out the where clause in the second query (the results are different already). Once added, it will eliminate the second row as well.

Salman A
  • 262,204
  • 82
  • 430
  • 521
1

In your first query you'll get all rows of left table

But

In second where when you filter by where cluase, it will give only those record which full fill the where condition

MatBailie
  • 83,401
  • 18
  • 103
  • 137
Fahmi
  • 37,315
  • 5
  • 22
  • 31
1

In a normal scenario, A LEFT JOIN or LEFT OUTER JOIN gives all the rows from the left table with matched rows from both tables. When a row in the left table has no matching rows in the right table, the associated result set row contains null values for all select list columns coming from the right table.

When we add a where clause with a left outer join, it behaves like an inner join, where the filter is applied after the ON clause, showing only those rows having

B.status is null or 10 or 100

Jayasurya Satheesh
  • 7,826
  • 3
  • 22
  • 39