Is there any difference in specifying a predicate in the ON or WHERE clause for an OUTER join?

Question

If I have 2 tables A and B like this:

A:

id   name
1     XXX


B:
id   name
1

If i say:

select * from A left join B on a.id = b.id and b. name is not null.

Should I get any rows back? Basically is the above query different from this query which will return 0 rows:

select * from A left join B on a.id = b.id where b. name is not null.

Did you try running both of these queries to get the result and compare them? — Taryn, Jun 09 '14 at 21:22

score 2 · Answer 1 · answered Jun 09 '14 at 21:41

The queries are different.

Consider this data:

A:
id   name
1    XXX
2    YYY
3    ZZZ

B:
id   name
1    xxx
3    null

The first query keeps the join from matching on records from B where the name is null, so it returns:

id  name    id    name
1   XXX     1     xxx
2   YYY     null  null
3   ZZZ     null  null

Notice that the id from B is null in the third record, because the join didn't match.

The second query matches records from B even if the name is null, but then it filters out all records where the name from B ends up as null (either from being null to start with, or because there is no matching record), so it returns:

id  name    id    name
1   XXX     1     xxx

mc110 · Answer 2 · 2014-06-09T21:40:57.260

Predicates in the ON clause are evaluated at join time, whereas predicates in the WHERE clause are post-join filters. With an INNER join this does not matter, but with an OUTER JOIN it does.

So in your example, the first query should return one row (the row from B will not match on the join, but because it is a LEFT OUTER the row from A will be matched to all NULLs from B), and the second query should return zero rows (as the WHERE clause will be applied after the join, and filter out the one resulting row).

http://www-01.ibm.com/support/knowledgecenter/SSGU8G_11.50.0/com.ibm.sqls.doc/ids_sqs_1033.htm?lang=en refers to post-join filters.

Output from postgres:

postgres=# select * from A left join B on a.id = b.id and b.name is not null;

id | name | id | name ----+------+----+------ 1 | XXX | | (1 row)

postgres=# select * from A left join B on a.id = b.id where b.name is not null;

id | name | id | name ----+------+----+------ (0 rows)

postgres=#

score 0 · Accepted Answer · answered Jun 09 '14 at 21:42

They are not the same.

Putting the compound condition on the join tells the database to evaluate BEFORE the cartesean is generated. So in the 1st query. All records from A will be returned and those records which match A in B on ID will be returned if the name is not null.

In your second query the cartesean is generated first and then the not null evaluation occurs.

Expanding on the data provided

A:
ID name
1  xxx
2  yyy
3  zzz

B:
ID name
1  
2  Bob

1st example would return:

1 XXX NULL NULL
2 YYY 2    BOB
3 ZZZ NULL NULL

All are returned as you indicated you wanted ALL Records, and only those records from B where the name wasn't null. but since the Null value identification occurs PRIOR to the Cartesian, only record 1 would be eliminated from table B But since it's in the section A where you wanted ALL records it would be included in the results.

2nd query would return

2 YYY 2    BOB

1 and 3 get EXCLUDED because Name is null, and this evaluation occurs AFTER the cartesean is generated.

Is there any difference in specifying a predicate in the ON or WHERE clause for an OUTER join?

3 Answers3