3

So I know how some things work in SQL but I don't know why and I'm unable to find a nice layman description of this online. For reference I'm using Oracle 11g and TOAD.

Question 1 - Outer Joins with Criteria

I know that if you place criteria on an outer joined table, you turn the query into an inner join regardless of your Syntax. So this query acts as an inner join:

SELECT a.field1, b.field1 
FROM tableA a 
LEFT JOIN tableB b on a.key = b.key 
WHERE b.field2 = 'someCriteria'

The way to get around this is to include an "OR IS NULL" in the second table's criteria. I know this to be true but I've never been able to wrap my head around why this is. Can someone explain why criteria on an outer table turns an outer join into an inner join?

Question 2 - Adding Criteria to Different Clauses Changes Results

So the above being true, I've been struggling with how the order of my criteria can alter the results of the following two queries. I have two tables - tableA and tableB - and need to do a left join compare of a subset of tableA to a subset of tableB.

SQL1

SELECT DISTINCT a.field1, b.field2
FROM tableA a 
LEFT JOIN tableB b 
on a.key = b.key
AND (b.field2 = 'somecriteria' or b.field2 IS NULL)
WHERE a.field1 = 'othercriteria' 

Results: SQL1 gives me the correct left joined results.

SQL2

SELECT DISTINCT a.field1, b.field2
FROM tableA a 
LEFT JOIN tableB b 
on a.key = b.key 
WHERE a.field1 = 'othercriteria' 
AND (b.field2 = 'somecriteria' or b.field2 IS NULL)

Results: SQL2 pulls back only an inner joined result of the two subsets (excluding those rows in tableA where tableB does not have a match).

Understanding The Results

The reason for this has something to do with the order in which the join and the where runs. I could understand this changing performance but I'm not following why it would change the results as the syntax is almost identical. To wrap my head around it, I ran the execution plans for both queries and got the following results (from TOAD):

Execution Plans:

SQL 1:

  • 5) Select Statement (Rows were returned by the Select statement)
  • 4) Sort Unique (The rows from step 3 were sorted to eliminate duplicate rows)
    • 3) Hash Join Right Outer (The result sets from steps 1, 2 were joined (hash)
      • 1) Table Access Full TABLE tableB (Every row in the table tableB is read)
      • 2) Table Access Full TABLE tableA (Every row in the table tableA is read)

SQL 2:

  • 11) Select Statement (Rows were returned by the Select statement)
  • 10) Sort Unique (The rows from step 9 were sorted to eliminate duplicate rows)
  • 9) All distinct rows from steps 4, 8 were returned
    • 4) Filter (For the rows returned by step 3, filter out rows depending on filter criteria)
      • 3) Hash Join Right Outer (The result sets from steps 1, 2 were joined (hash)
      • 1) Table Access Full TABLE tableB (Every row in the table tableB is read
      • 2) Table Access Full TABLE tableA (Every row in the table tableA is read)
    • 8) Filter (For the rows returned by step 7, filter out rows depending on filter criteria)
      • 7) Hash Join Right Outer (The result sets from steps 5, 6 were joined (hash)
      • 5) Table Access Full TABLE tableB (Every row in the table tableB is read)
      • 6) Table Access Full TABLE tableA (Every row in the table tableA is read)

So I have no doubt that the above execution plans explain perfectly why SQL 2 gives me different results than SQL 1 but I'm having a hard time reading these plans. Can someone help me translate these execution plans and explain why SQL2 is treated as an Inner Join because the tableA criteria is listed in the WHERE clause instead of the JOIN?

Thanks in advance!!

  • 1
    http://wiki.lessthandot.com/index.php/WHERE_conditions_on_a_LEFT_JOIN This may help you unerstand. – HLGEM Oct 29 '13 at 14:47
  • Does this answer your question? [What is the difference between "INNER JOIN" and "OUTER JOIN"?](https://stackoverflow.com/questions/38549/what-is-the-difference-between-inner-join-and-outer-join) – philipxy Mar 04 '21 at 11:39

2 Answers2

4

For question 1, consider table a containing two rows:

key       field1
1         a
2         b

And table b containing 3 rows:

key       field1       field2
1         c            someCriteria
1         d            notSomeCriteria
1         e            NULL

your FROM clause (with its JOINs) effectively generates a result set that looks like this:

(a)key     (a)field1     (b)key      (b)field1     (b)field2
1          a             1           c             someCriteria
1          a             1           d             notSomeCriteria
1          a             1           e             NULL
2          b             NULL        NULL          NULL

By the time the WHERE clause is considered, it no longer really "knows" whether a particular JOIN was successful or not - it doesn't selectively apply criteria based on whether or not the join succeeded. So if you've specified the b.field2 should equal someCriteria, you're saying that it should only return the first row (1,a,1,c,someCriteria).

If you want to make particular assertions about NULLable columns, you do really want the WHERE clause to act this way and force you to explicitly consider NULLs (whether those be generated from a NULL column or by a JOIN failing)

The cure I'd usually adopt is the one shown in HLGEM's answer, rather than adding OR b.field2 IS NULL since you usually want to exclude the (1,a,1,e,NULL) row.

Community
  • 1
  • 1
Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
  • 1
    Thanks Damien. That actually makes a lot of sense. I guess I never really focused on the JOIN as the initial query and the WHERE statement as a secondary filter. But when you break them out into each step that is very clear. –  Oct 29 '13 at 16:12
  • 1
    @DanK - another way to think about it is that the `FROM` clause is used to create the "shape" of the final result - it has to include everything that you want to produce. The `WHERE` clause is then used to trim off the bits that you don't actually want. – Damien_The_Unbeliever Oct 29 '13 at 19:51
2

I would write the first as:

SELECT a.field1, b.field1 
FROM tableA a 
LEFT JOIN tableB b on a.key = b.key 
AND b.field2 = 'someCriteria'

This would rerturn all records from table a and the b.field1 would only have data if b.field2 = 'somecriteria'

HLGEM
  • 94,695
  • 15
  • 113
  • 186
  • That's interesting... so are you saying that placing the AND inside the join is the same thing as Left Joining tableB as a nested query? For instance: LEFT JOIN (SELECT * from tableB where field2 = 'someCriteria') b on a.key = b.key –  Oct 29 '13 at 15:15