3

In other words, for a nested/multiple JOIN SQL statement, is it safe to say that one should always use INNER JOIN first (either put it at the top line or by using parentheses to first INNER JOIN two tables) and make sure it precedes any OUTER JOIN (LEFT, RIGHT, FULL)?


My understanding is that matching columns (e.g., Primary Key column and Foreign Key column) usually don't have NULL values. And any non-matching rows including NULL from an OUTER JOIN result would be removed when being INNER JOIN-ed by another table, simply because nothing would match a NULL!!

(BTW, I never JOINed any two tables using columns that both have NULL, therefore, I would not comment on whether a NULL value would match a NULL value when INNER JOIN-ing tables. At least, this would be extremely rare, I guess)

Nicholas
  • 2,560
  • 2
  • 31
  • 58

3 Answers3

3

It is a good general rule to start with inner joins and then follow with left joins. right joins are almost never really needed, and full joins are a special case. This is basically how I write queries.

However, it does depend on what the join conditions are. So, although I think the rules above are reasonable and sufficient for writing almost any query, it is possible to write queries with inner joins following outer joins.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • `However, it does depend on what the join conditions are...` Besides comparing the values from two columns, do you also use other types of conditions in a `JOIN` predicate? Heard it's better to use `WHERE` for all other kinds of conditions tho – Nicholas Mar 11 '19 at 02:04
  • @Nicholas . . . Any boolean conditions can go in an `on` clause, including `in` and `exists` with subqueries. The logic can be quite complicated . . . although such complication is usually unnecessary. – Gordon Linoff Mar 11 '19 at 02:07
  • 4
    `"right joins are almost never really needed"` > I've said for years that more often than not a RIGHT JOIN is simply a LEFT JOIN that needs to be refactored. – Shawn Mar 11 '19 at 02:08
  • @Nicholas A JOIN condition should be used to link the tables together and a WHERE should be used to filter those JOINed results. – Shawn Mar 11 '19 at 02:17
  • @Nicholas, if you want to filter what you left join to (example: you want to left join to TableA on ID but only if TableA.Status = 'Active'), you need to put the filter on the join, not in a where, otherwise you filter out any rows from your main table that have no matching TableA record at all (ie, you've just turned it into a inner join by using the where filter). – DancingFool Mar 11 '19 at 06:38
3

A subsequent inner join will only "essentially invalidate" an outer join if the inner join's ON clause requires should-be-optional rows to be present. In such a case, reordering the join either won't work or won't help; rather, the only fix is to change the inner join to an appropriate outer join.

So, for example, this works fine:

    SELECT *
      FROM person
 LEFT JOIN address
        ON person.address_id = address.id
INNER JOIN email
        ON person.email_id = email.id

and is equivalent to what you'd get if you moved the left outer join (lines 3–4) after the inner join (lines 5–6); whereas this does not work as intended:

    SELECT *
      FROM person
 LEFT JOIN address
        ON person.address_id = address.id
INNER JOIN city
        ON address.city_id = city.id

because the second ON clause can only be satisfied when address.city_id is non-null. (In this case the right fix is to change the inner join to a left outer join.)

That said, I do agree with Gordon Linoff that it's usually best to put your inner joins before your left outer joins; this is because inner joins tend to indicate more "essential" restrictions, so this ordering is usually more readable. (And I agree with both Gordon Linoff and Shawn that right outer joins are usually better avoided.)

ruakh
  • 175,680
  • 26
  • 273
  • 307
  • 1
    It's also worth noting that the server's Query Optimizer can reorder the JOINs in any way it feels will provide the optimal query plan. – Shawn Mar 11 '19 at 02:27
  • @Shawn Does parentheses affect this? – Nicholas Mar 11 '19 at 02:52
  • @Nicholas I think that would depend on what you were grouping inside the parentheses. What were you planning to use the parentheses around? Regardless, unless you use some specific hints to force your server to use a specific query plan, it'll still do what it thinks best. – Shawn Mar 11 '19 at 03:46
  • I think there is ambiguity over `"...is equivalent to what you'd get if you swapped the order of the two joins..."`. By swapping, do you mean 1. person inner join email, then person left join address, or 2. person inner join address, then person left join email? I feel like you are talking about the first scenario rather than literally swapping `JOIN` keywords as in the second scenario. But can I ensure that I understand you? – Nicholas Mar 11 '19 at 17:07
  • @Nicholas: Correct, I mean the first interpretation. I've now tweaked the answer to be a bit more explicit. – ruakh Mar 11 '19 at 20:25
1

There is no notion of having to do things in a certain order. There are consequences of a particular choice of expression.

left join on returns inner join on rows plus unmatched left table rows extended by nulls. Similarly, right join on returns inner join on rows plus unmatched right table rows extended by nulls. Similarly, full join on returns inner join on rows plus unmatched left and right table rows extended by nulls. Always know what inner join on you want as part of an outer join on.

After a left/right/full join on, an inner join on, where or having needing some column(s) of the right/left/2 [sic] table(s) to be not null removes rows with introduced nulls, leaving only inner join on rows. It "turns OUTER JOIN into INNER JOIN". You are talking about that.

Similarly, after a full join on, needing some column(s) of only the right/left [sic] table to be not null "turns FULL JOIN into RIGHT/LEFT [sic] JOIN".

You shouldn't have to worry about this. Just focus on writing queries that return what you want. Your question is like asking, should I avoid dividing by zero since it's undefined or adding zero because it doesn't do anything? Why would you, since it doesn't do what you want? If you are writing wrong queries then find out what the operators do.

Is there any rule of thumb to construct SQL query from a human-readable description?

PS My characterization of outer joins & of when null-extended rows are dropped focuses on the associated inner join, on the on as a whole & on whetherleft and/or right table columns are null. That's because the simplest definitions of the operators involve only those things. Your choice of organization of parts is misleading & impeding you.

  1. Any two tables can be joined on any condition. PK-FK equality is just a special case. (PKs & FKs are not needed to query. They do imply certain constraints on inputs & results though.) FK & unique columns & other columns can have nulls on input. (PK means unique not null.)
  2. "matching columns" & "nothing would match a NULL" are confused because it's row pairs that match or don't, per the whole condition of an on.
  3. "any non-matching rows including NULL from an OUTER JOIN result would be removed when being INNER JOIN"--No, it depends on the whole condition of an on or where.
  4. "this would be extremely rare"--Doesn't matter. Either something can happen or it can't happen; either the query is wrong or it's right.
philipxy
  • 14,867
  • 6
  • 39
  • 83