25

In this candidate answer it is asserted that JOIN is better than LEFT JOIN under some circumstances involving some WHERE clauses because it does not confuse the query planner and is not "pointless". The assertion/assumption is that it should be obvious to anyone.

Please explain further or provide link(s) for further reading.

Community
  • 1
  • 1
Dwayne Towell
  • 8,154
  • 4
  • 36
  • 49
  • 1
    Better? They serve a different purpose. JOIN is an inner join, LEFT JOIN is an outer join (the same thing as LEFT OUTER JOIN). Depending on your intentions you will either use an outer or inner join. The WHERE clause should not be used for join conditions. It should be used for other criteria; ie. filtering. – Brian DeMilia Jul 21 '14 at 23:47
  • 2
    I read the answer there as "Since the LEFT JOIN (with the WHERE) is *effectively* an INNER JOIN, just use an INNER JOIN instead." I'm not sure about how "confusing" it is to a [specific] query planner, but it's less than ideal for human [read: my] consumption, IMOHO. (The same can be said for using a CROSS JOIN when an INNER would suffice, eg.) – user2864740 Jul 21 '14 at 23:55
  • 1
    A left join with a where clause THAT USES THE OUTER JOINED TABLE is effectively an inner join. However if the where clause does not use the outer joined table, no, that is not the case. – Brian DeMilia Jul 22 '14 at 00:05

2 Answers2

41

Effectively, WHERE conditions and JOIN conditions for [INNER] JOIN are 100 % equivalent in PostgreSQL. (It's good practice to use explicit JOIN conditions to make queries easier to read and maintain, though).

The same is not true for a LEFT JOIN combined with a WHERE condition on a table to the right of the join. The purpose of a LEFT JOIN is to preserve all rows on the left side of the join, irregardless of a match on the right side. If no match is found, the row is extended with NULL values for columns on the right side. The manual:

LEFT OUTER JOIN

First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Thus, the joined table always has at least one row for each row in T1.

If you then apply a WHERE condition that requires something else than a NULL value on columns of tables on the right side, you void the effect and forcibly convert the LEFT [OUTER] JOIN to work like a plain [INNER] JOIN, just (possibly) more expensively due to a more complicated query plan.

In a query with many joined tables, Postgres (or any RDBMS) is hard put to it to find the best (or even a good) query plan. The number of theoretically possible sequences to join tables grows factorially (!). Postgres uses the "Generic Query Optimizer" for the task and there are some settings to influence it.

Obfuscating the query with misleading LEFT JOIN as outlined, makes the work of the query planner harder, is misleading for human readers and typically hints at errors in the query logic.

Related answers for problems stemming from this:

Etc.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • If it really "more expensively"? Modern query planners - granted, I predominately use SQL Server - impress me. – user2864740 Jul 22 '14 at 00:25
  • 3
    What difference does it make if it is or is not more expensive. Why would you use an outer join with table X if you're going to say that values on table X should be something (other than null)? If anything, don't do it simply because it doesn't make sense. – Brian DeMilia Jul 22 '14 at 00:29
  • @user2864740: It won't matter much in simple cases. It makes the job of the query optimizer harder in complex cases, though. But as Brian commented: don't do it either way. It's also very confusing for the human reader. – Erwin Brandstetter Jul 22 '14 at 00:38
  • @ErwinBrandstetter, thank you for a very nice answer, sorry for the earlier confusion, Brian's example made things very clear so I went with his answer. – Dwayne Towell Jul 22 '14 at 01:36
  • @DwayneTowell: I added some examples to illustrate. – Erwin Brandstetter Jul 22 '14 at 01:43
31

Consider the following example. We have two tables, DEPARTMENTS and EMPLOYEES.

Some departments do not yet have any employees.

This query uses an inner join that finds the department employee 999 works at, if any, otherwise it shows nothing (not even the employee or his or her name):

select a.department_id, a.department_desc, b.employee_id, b.employee_name
  from departments a
  join employees b
    on a.department_id = b.department_id
 where b.employee_id = '999'

This next query uses an outer join (left between departments and employees) and finds the department that employee 999 works for. However it too will not show the employee's ID or his or her name, if they do not work at any departments. That is because of the outer joined table being used in the WHERE clause. If there is no matching department, it will be null (not 999, even though 999 exists in employees).

select a.department_id, a.department_desc, b.employee_id, b.employee_name
  from departments a
  left join employees b
    on a.department_id = b.department_id
 where b.employee_id = '999'

But consider this query:

select a.department_id, a.department_desc, b.employee_id, b.employee_name
  from departments a
  left join employees b
    on a.department_id = b.department_id
   and b.employee_id= '999'

Now the criteria is in the on clause. So even if this employee works at no departments, he will still be returned (his ID and name). The department columns will be null, but we get a result (the employee side).

You might think you would never want to use the outer joined table in the WHERE clause, but that is not necessarily the case. Normally it is, for the reason described above, though.

Suppose you want all departments with no employees. Then you could run the following, which does use an outer join, and the outer joined table is used in the where clause:

select a.department_id, a.department_desc, b.employee_id
  from departments a
  left join employees b
    on a.department_id = b.department_id
 where b.employee_id is null

^^ Shows departments with no employees.

The above is likely the only legitimate reason you would want to use an outer joined table in the WHERE clause rather than the ON clause (which I think is what your question is; the difference between inner and outer joins is an entirely different topic).

A good way to look at is this: You use outer joins to allow nulls. Why would you then use an outer join and say that a field should not be null and should be equal to 'XYZ'? If a value has to be 'XYZ' (not null), then why instruct the database to allow nulls to come back? It's like saying one thing and then overriding it later.

Brian DeMilia
  • 13,103
  • 1
  • 23
  • 33
  • 5
    The third example in this answer is wrong. The employee will not be returned in this case, that query will return *all departments* and *any associated employee with an id of 999*. If the 999 employee is not associated with any departments they will not be returned in the result. – Nathan Griffiths Mar 23 '18 at 03:08
  • Looks like queries above and below "But consider this query" are same. – Neeraj Mar 06 '20 at 17:39
  • 1
    Difference is on the WHERE vs AND (that stay part of the LEFT JOIN "ON") – naitan Oct 26 '20 at 02:33