So I know how some things work in SQL but I don't know why and I'm unable to find a nice layman description of this online. For reference I'm using Oracle 11g and TOAD.
Question 1 - Outer Joins with Criteria
I know that if you place criteria on an outer joined table, you turn the query into an inner join regardless of your Syntax. So this query acts as an inner join:
SELECT a.field1, b.field1
FROM tableA a
LEFT JOIN tableB b on a.key = b.key
WHERE b.field2 = 'someCriteria'
The way to get around this is to include an "OR IS NULL" in the second table's criteria. I know this to be true but I've never been able to wrap my head around why this is. Can someone explain why criteria on an outer table turns an outer join into an inner join?
Question 2 - Adding Criteria to Different Clauses Changes Results
So the above being true, I've been struggling with how the order of my criteria can alter the results of the following two queries. I have two tables - tableA and tableB - and need to do a left join compare of a subset of tableA to a subset of tableB.
SQL1
SELECT DISTINCT a.field1, b.field2
FROM tableA a
LEFT JOIN tableB b
on a.key = b.key
AND (b.field2 = 'somecriteria' or b.field2 IS NULL)
WHERE a.field1 = 'othercriteria'
Results: SQL1 gives me the correct left joined results.
SQL2
SELECT DISTINCT a.field1, b.field2
FROM tableA a
LEFT JOIN tableB b
on a.key = b.key
WHERE a.field1 = 'othercriteria'
AND (b.field2 = 'somecriteria' or b.field2 IS NULL)
Results: SQL2 pulls back only an inner joined result of the two subsets (excluding those rows in tableA where tableB does not have a match).
Understanding The Results
The reason for this has something to do with the order in which the join and the where runs. I could understand this changing performance but I'm not following why it would change the results as the syntax is almost identical. To wrap my head around it, I ran the execution plans for both queries and got the following results (from TOAD):
Execution Plans:
SQL 1:
- 5) Select Statement (Rows were returned by the Select statement)
- 4) Sort Unique (The rows from step 3 were sorted to eliminate duplicate rows)
- 3) Hash Join Right Outer (The result sets from steps 1, 2 were joined (hash)
- 1) Table Access Full TABLE tableB (Every row in the table tableB is read)
- 2) Table Access Full TABLE tableA (Every row in the table tableA is read)
- 3) Hash Join Right Outer (The result sets from steps 1, 2 were joined (hash)
SQL 2:
- 11) Select Statement (Rows were returned by the Select statement)
- 10) Sort Unique (The rows from step 9 were sorted to eliminate duplicate rows)
- 9) All distinct rows from steps 4, 8 were returned
- 4) Filter (For the rows returned by step 3, filter out rows depending on filter criteria)
- 3) Hash Join Right Outer (The result sets from steps 1, 2 were joined (hash)
- 1) Table Access Full TABLE tableB (Every row in the table tableB is read
- 2) Table Access Full TABLE tableA (Every row in the table tableA is read)
- 8) Filter (For the rows returned by step 7, filter out rows depending on filter criteria)
- 7) Hash Join Right Outer (The result sets from steps 5, 6 were joined (hash)
- 5) Table Access Full TABLE tableB (Every row in the table tableB is read)
- 6) Table Access Full TABLE tableA (Every row in the table tableA is read)
- 4) Filter (For the rows returned by step 3, filter out rows depending on filter criteria)
So I have no doubt that the above execution plans explain perfectly why SQL 2 gives me different results than SQL 1 but I'm having a hard time reading these plans. Can someone help me translate these execution plans and explain why SQL2 is treated as an Inner Join because the tableA criteria is listed in the WHERE clause instead of the JOIN?
Thanks in advance!!