6

I am trying to compare two tables to find rows in each table that is not in the other. Table 1 has a groupby column to create 2 sets of data within table one.

groupby     number
----------- -----------
1           1
1           2
2           1
2           2
2           4

Table 2 has only one column.

number
-----------
1
3
4

So Table 1 has the values 1,2,4 in group 2 and Table 2 has the values 1,3,4.

I expect the following result when joining for Group 2:

`Table 1 LEFT OUTER Join Table 2`
T1_Groupby  T1_Number   T2_Number
----------- ----------- -----------
2           2           NULL

`Table 2 LEFT OUTER Join Table 1`
T1_Groupby  T1_Number   T2_Number
----------- ----------- -----------
NULL        NULL        3

The only way I can get this to work is if I put a where clause for the first join:

PRINT 'Table 1 LEFT OUTER Join Table 2, with WHERE clause'
select  table1.groupby as [T1_Groupby],
        table1.number as [T1_Number],
        table2.number as [T2_Number]
from    table1
        LEFT OUTER join table2
        --******************************
        on table1.number = table2.number
        --******************************
WHERE   table1.groupby = 2
    AND table2.number IS NULL

and a filter in the ON for the second:

PRINT 'Table 2 LEFT OUTER Join Table 1, with ON clause'
select  table1.groupby as [T1_Groupby],
        table1.number as [T1_Number],
        table2.number as [T2_Number]
from    table2
        LEFT OUTER join table1
            --******************************
            on table2.number = table1.number
            AND table1.groupby = 2
            --******************************
WHERE   table1.number IS NULL

Can anyone come up with a way of not using the filter in the on clause but in the where clause?

The context of this is I have a staging area in a database and I want to identify new records and records that have been deleted. The groupby field is the equivalent of a batchid for an extract and I am comparing the latest extract in a temp table to a the batch from yesterday stored in a partioneds table, which also has all the previously extracted batches as well. Code to create table 1 and 2:

create table table1 (number int, groupby int)
create table table2 (number int)
insert into table1 (number, groupby) values (1, 1)
insert into table1 (number, groupby) values (2, 1)
insert into table1 (number, groupby) values (1, 2)
insert into table2 (number) values (1)
insert into table1 (number, groupby) values (2, 2)
insert into table2 (number) values (3)  
insert into table1 (number, groupby) values (4, 2)  
insert into table2 (number) values (4)  

EDIT:

A bit more context - depending on where I put the filter I different results. As stated above the where clause gives me the correct result in one state and the ON in the other. I am looking for a consistent way of doing this.

Where -

select  table1.groupby as [T1_Groupby],
        table1.number as [T1_Number],
        table2.number as [T2_Number]
from    table1
        LEFT OUTER join table2
            --******************************
            on table1.number = table2.number
            --******************************
WHERE   table1.groupby = 2 
    AND table2.number IS NULL

Result:

T1_Groupby  T1_Number   T2_Number
----------- ----------- -----------
2           2           NULL

On -

select  table1.groupby as [T1_Groupby],
        table1.number as [T1_Number],
        table2.number as [T2_Number]
from    table1
        LEFT OUTER join table2
            --******************************
            on table1.number = table2.number
            AND table1.groupby = 2
            --******************************
WHERE   table2.number IS NULL

Result:

T1_Groupby  T1_Number   T2_Number
----------- ----------- -----------
1           1           NULL
2           2           NULL
1           2           NULL

Where (table 2 this time) -

select  table1.groupby as [T1_Groupby],
        table1.number as [T1_Number],
        table2.number as [T2_Number]
from    table2
        LEFT OUTER join table1
            --******************************
            on table2.number = table1.number
            AND table1.groupby = 2
            --******************************
WHERE   table1.number IS NULL

Result:

T1_Groupby  T1_Number   T2_Number
----------- ----------- -----------
NULL        NULL        3

On -

select  table1.groupby as [T1_Groupby],
        table1.number as [T1_Number],
        table2.number as [T2_Number]
from    table2
        LEFT OUTER join table1
            --******************************
            on table2.number = table1.number
            --******************************
WHERE   table1.number IS NULL
    AND table1.groupby = 2

Result:

T1_Groupby  T1_Number   T2_Number
----------- ----------- -----------
(0) rows returned
Greg Potter
  • 63
  • 1
  • 1
  • 4
  • 1
    Why is it important to move the condition from the `JOIN` clause into the `WHERE` clause? Putting predicates in the `JOIN` is a normal thing to do when you want this type of behaviour. – Aaronaught May 28 '10 at 14:57
  • Moving it to the where clause is not as important as the fact that for the first join where I join table 1 to table 2 I get a different result if I put the filter in the on and not in the where. PRINT 'Table 1 LEFT OUTER Join Table 2, with WHERE clause' select table1.groupby as [T1_Groupby], table1.number as [T1_Number], table2.number as [T2_Number] from table1 LEFT OUTER join table2 --****************************** on table1.number = table2.number AND table1.groupby = 2 --****************************** WHERE --table1.groupby = 2 AND table2.number IS NULL Gives null only – Greg Potter May 28 '10 at 15:06
  • Well, which one do you want? It sounds like the NULL-join does what you mean. This is a standard way to do queries like this and is typically preferable to the subquery alternatives. I see no benefit in trying to push the condition into a WHERE clause. – bobince May 28 '10 at 15:23
  • Please check the EDIT part I added - might explain better. – Greg Potter May 28 '10 at 15:25
  • SQL-92 is an abomination. – Sam Jul 08 '13 at 07:47
  • +1 for such a well-written question! – Kevin Babcock Jul 26 '13 at 14:57

7 Answers7

13

If you filter the left outer joined table in the WHERE clause then you are in effect creating an inner join

See also this wiki page: WHERE conditions on a LEFT JOIN

SQLMenace
  • 132,095
  • 25
  • 206
  • 225
  • That link has gone stale. – O. R. Mapper Jun 18 '20 at 10:56
  • @O.R.Mapper - An alternate answer since the link is dead. Please upvote if you find it useful https://stackoverflow.com/a/63735594/6648326 My answer is a copy-paste, but it took initiative and effort to put it here. – MasterJoe Sep 04 '20 at 05:59
7

with LEFT OUTER JOINS, you must filter in the ON clause or use this:

WHERE
    (LeftJoinTable.ID IS NULL OR LeftJoinTable.Col1=YourFilter)

if you just filter in the WHERE:

WHERE 
    LeftJoinTable.Col1=YourFilter

you will discard the parent joining row whenever there isn't a LeftJoinTable.ID (making the join an INNER JOIN).

By placing the filter in the ON, you cause the elimination of the LEFT JOIN row but not the elimination of the parent joining row, that is just how it works.

EDIT base don OP's comment
the only way to filter a a LEFT OUTER JOIN table is in the ON clause, unless you want to use an OR like I show in the first code example above. There is nothing wrong filtering a LEFT OUTER JOIN in the ON clause, this is how you do it.

KM.
  • 101,727
  • 34
  • 178
  • 212
  • 1
    I probably (hindsight being the perfect science) should have put this in the original question, but if you look at my comment above you will see that changing the filter from the where to the on first either joins gets you a different result. I am looking for a 'rule' when coding left outer joins that I should either use ON clauses or where clauses. – Greg Potter May 28 '10 at 15:09
  • Various answers such as [this one](https://stackoverflow.com/a/151110/1430156), or [that blogpost](https://www.xaprb.com/blog/2005/09/23/how-to-write-a-sql-exclusion-join/), claim the opposite. Is there any contextual difference I'm missing? – O. R. Mapper Jun 18 '20 at 11:17
3

As the query is written, it makes sense to put the join in the ON clause, since you specifically only want to join on values in group '2' from table 1.

The alternative is to prefilter table1 to the group you are interested in, like this

select  t1Group.groupby,
        t1Group.number as [T1_Number],
        table2.number as [T2_Number]
from    table2
        LEFT OUTER join (SELECT * FROM table1 WHERE groupby=2) t1Group
            on table2.number = t1Group.number
WHERE   t1Group.number IS NULL
mdma
  • 56,943
  • 12
  • 94
  • 128
  • +1 the key to understanding why filtering in the "ON" clause works is to take the filter into an inline view like this. – araqnid May 28 '10 at 15:14
0
SELECT  dbo.table1.groupby as [T1_Groupby],
        dbo.table1.number as [T1_Number],
        t21.number as [t21_Number]
FROM    dbo.table1
LEFT OUTER join dbo.table2 t21
    ON dbo.table1.number = t21.number
LEFT OUTER join dbo.table2 t22
    ON dbo.table1.groupby= t22.number
WHERE t21.number is null AND t22.number is null
bensiu
  • 24,660
  • 56
  • 77
  • 117
angel
  • 1
0
    select  dbo.table1.groupby as [T1_Groupby],
                            dbo.table1.number as [T1_Number],
                            t22.number as [t22_Number]

                    from    dbo.table1 right outer join 
                    (select  dbo.table1.groupby,
                            dbo.table2.number as number

                    from    dbo.table1
                    right OUTER join dbo.table2
                    on dbo.table1.number = dbo.table2.number

                    where dbo.table1.number is null) t22
                    on dbo.table1.groupby = t22.number
                    where dbo.table1.groupby is null
angel
  • 1
0

I have been struggling with this myself - and at the end of the day was to select data from table with Where Clause and put it into a temp table, and then use Left outer join on the Temp Table.

SELECT table1.GroupBy, table1.number INTO #Temp FROM table1 WHere GroupBy = 2
SELECT table2.Groupby, #temp.number From table2 LEFT OUTER JOIN #temp on table2.Groupby = #temp.Groupby
Pang
  • 9,564
  • 146
  • 81
  • 122
KMW
  • 1
0

The link in the top answer is not working anymore. Here is another link/blog post which answers this question. I have copy pasted the main contents (not all contents) of that link here so that we don't lose the knowledge if that link also stops working.

TL;DR: Be careful with left joins because a LEFT JOIN query may or may not be executed as a LEFT JOIN. Its weird, but true.

Create sample test data: (In temporary tables)

DECLARE @Table1 TABLE (colID int, colVal varchar(5));
DECLARE @Table2 TABLE (columnID int, columnVal varchar(15));
  
INSERT @Table1 VALUES (1,'one'),(2,'two'),(3,'three'),(4,'four'),(5,'five');
INSERT @Table2 VALUES (1,'some value'),(3,'blah blah blah'),(5,'hello world'),(12,'howdy');

If we want to return all the records in Table1 regardless of whether there is an associated record in Table2 and show the data from Table2 when there is an associated record we'd write a LEFT JOIN, like so:

SELECT *
  FROM @Table1 tb1
        LEFT OUTER JOIN @Table2 tb2
          ON tb1.colID = tb2.columnID;

enter image description here

However, if we now want to add a WHERE clause to the query to only get the data from Table2 where the ID is less than 4 we might do something like this:

enter image description here

See that we only get the values where there is a matching ID in both tables. That looks like an INNER JOIN and it is actually executed as an inner join. To confirm that, see the execution plan after running the LEFT JOIN with and without the WHERE condition. (If the readers want to get into this detail, then please go to the original author's blog post)

So how can we get around this? Well, since you're doing a LEFT JOIN to begin with you obviously need or want to return all the records from Table1 regardless of the data in Table2. And if you really don't want to return data for certain records in Table2 then you can filter those records out in your JOIN predicate (the "search criteria" in the ON portion of the JOIN). For example, the last query would be written like this instead:

SELECT *
  FROM @Table1 tb1
        LEFT OUTER JOIN @Table2 tb2
          ON tb1.colID = tb2.columnID
         AND tb2.columnID < 4;

enter image description here

See that we still get the 5 records from Table1, but not the data from Table2 which don't meet our criteria. It just returns NULLs for the data in that table which doesn't meet the criteria. We can look at the execution plan again to prove that we are actually using a LEFT JOIN. (If the readers want to get into this detail, then please go to the original author's blog post)

More notes: What about when instead of a value someone uses the IS NULL condition instead? Well, in that case you can use that in the WHERE clause.

What if we use IS NOT NULL instead? It will do an INNER JOIN just like it would with an actual value.

We can verify the above claims by looking at the execution plan as mentioned in the original author's blog post).

MasterJoe
  • 2,103
  • 5
  • 32
  • 58