29

It's easy to understand why left outer joins are not commutative, but I'm having some trouble understanding whether they are associative. Several online sources suggest that they are not, but I haven't managed to convince myself that this is the case.

Suppose we have three tables: A, B, and C.

Let A contain two columns, ID and B_ID, where ID is the primary key of table A and B_ID is a foreign key corresponding to the primary key of table B.

Let B contain two columns, ID and C_ID, where ID is the primary key of table B and C_ID is a foreign key corresponding to the primary key of table C.

Let C contain two columns, ID and VALUE, where ID is the primary key of table C and VALUE just contains some arbitrary values.

Then shouldn't (A left outer join B) left outer join C be equal to A left outer join (B left outer join C)?

Tianxiang Xiong
  • 3,887
  • 9
  • 44
  • 63
  • 5
    There's an implicit assumption in your question that the first join only involves columns of A and B, and the second only involves columns of B and C, that perhaps should be made more explicit. If we're talking about any arbitrary `LEFT OUTER JOIN`, then we can easily imagine that one of the `JOIN`'s has an `ON` clause of, say, `A.id + B.id + C.id = 10`. In that case, clearly they're not associative - one of the possible orderings of the JOINs isn't even a legal query. – Mark Amery Nov 16 '13 at 18:49
  • Can you give me an example of how the condition `A.id + B.id + C.id = 10` can be used? I assume it'd have to be the "outer" join predicate, e.g. `(A left outer join B on A.B_ID = B.ID) left outer join C on A.ID + B.ID + C.ID = 10`. – Tianxiang Xiong Nov 16 '13 at 19:33
  • Here's an example: http://sqlfiddle.com/#!2/0d462/10, although it's a bit crap to look at on SQLFiddle because SQLFiddle erroneously only shows one 'id' column in the result set, even though we weren't JOINing based upon the 'id' columns being equal. – Mark Amery Nov 16 '13 at 19:51
  • possible duplicate of [Does the join order matters in SQL?](http://stackoverflow.com/questions/9614922/does-the-join-order-matters-in-sql) – ypercubeᵀᴹ Nov 16 '13 at 20:09
  • (Per the first comment above:) SQL non-natural inner & outer join are not binary operators. Each takes two tables & a condition that represents a function from a row to a boolean. So it does not make sense to ask whether one is commutive or associative. Explain what you mean without using those words. In the case of commutative it seems likely you are talking about (t1 join t2 on c) = (t2 join t1 on c) but the "associative" case is not clear, with 3 tables & 2 conditions. Your pseudo-code forgets the conditions. You also seem to be assuming restrictions on conditions that you need to give. – philipxy Apr 06 '19 at 21:38

3 Answers3

24

In this thread, it is said, that they are not associative: Is LEFT OUTER JOIN associative?

However, I've found some book online where it is stated, that OUTER JOINs are associative, when the tables on the far left side and far right side have no attributes in common (here).

Here is a graphical presentation (MSPaint ftw):

Graphical presentation of impact of left outer joins order

Another way to look at it:

Since you said that table A joins with B, and B joins with C, then:

  • When you first join A and B, you are left with all records from A. Some of them have values from B. Now, for some of those rows for which you got value from B, you get values from C.
  • When you first join B and C, you and up with the whole table B, where some of the records have values from C. Now, you take all records from A and join some of them with all rows from B joined with C. Here, again, you get all rows from A, but some of them have values from B, some of which have values from C.

I don't see any possibility where, in conditons described by you, there would be a data loss depending on the sequence of LEFT joins.

Basing on the data provided by Tilak in his answer (which is now deleted), I've built a simple test case:

CREATE TABLE atab (id NUMBER, val VARCHAR2(10));
CREATE TABLE btab (id NUMBER, val VARCHAR2(10));
CREATE TABLE ctab (id NUMBER, val VARCHAR2(10));

INSERT INTO atab VALUES (1, 'A1');
INSERT INTO atab VALUES (2, 'A2');
INSERT INTO atab VALUES (3, 'A3');

INSERT INTO btab VALUES (1, 'B1');
INSERT INTO btab VALUES (2, 'B2');
INSERT INTO btab VALUES (4, 'B4');

INSERT INTO ctab VALUES (1, 'C1');
INSERT INTO ctab VALUES (3, 'C3');
INSERT INTO ctab VALUES (5, 'C5');

SELECT ab.aid, ab.aval, ab.bval, c.val AS cval
  FROM (
    SELECT a.id AS aid, a.val AS aval, b.id AS bid, b.val AS bval
      FROM atab a LEFT OUTER JOIN btab b ON (a.id = b.id)
    ) ab
    LEFT OUTER JOIN ctab c ON (ab.bid = c.id)
ORDER BY ab.aid
;
       AID AVAL       BVAL       CVAL     
---------- ---------- ---------- ----------
         1 A1         B1         C1         
         2 A2         B2                    
         3 A3                               
SELECT a.id, a.val AS aval, bc.bval, bc.cval
  FROM
    atab a
    LEFT OUTER JOIN (
      SELECT b.id AS bid, b.val AS bval, c.id AS cid, c.val AS cval
        FROM btab b LEFT OUTER JOIN ctab c ON (b.id = c.id)
    ) bc
      ON (a.id = bc.bid)
ORDER BY a.id
;
        ID AVAL       BVAL       CVAL     
---------- ---------- ---------- ----------
         1 A1         B1         C1         
         2 A2         B2                    
         3 A3                            

It seems in this particular example, that both solutions give the same result. I can't think of any other dataset that would make those queries return different results.

Check at SQLFiddle:

Community
  • 1
  • 1
Przemyslaw Kruglej
  • 8,003
  • 2
  • 26
  • 41
  • 1
    "I've found some book online where it is said, that OUTER JOINs are associative, when the table on the far left side and far right side have no attributes in common" - I think this statement regards only natural outer joins, where joining is done on the common attributes. The question mentions qualified foreign key names (A_ID), so explicit ON clauses will be required. – Fabian Nov 16 '13 at 19:34
  • @Fabian I don't think you're right on this one... The way we are joining here is based on different attributes for each relation, and the A and C tables are not directly related, so, in this case, I think it is like the book says. – Przemyslaw Kruglej Nov 16 '13 at 19:47
  • @PrzemyslawKruglej Thank you. You made me realize that under certain conditions, the outer joins are associative and I corrected a previous answer of mine. – ypercubeᵀᴹ Nov 16 '13 at 21:37
  • @Przemyslaw: Yes, the argument holds for that case, too. But in general, each joins condition relates two different lists of attributes (assuming an equi-join). So one has to generalize what "common attributes" means here. Furthermore, when there are ON conditions, the notion of associativity can regard both parts, the table clause and the join condition. Both may or may not be re-arranged associatively. – Fabian Nov 17 '13 at 10:38
  • @PrzemyslawKruglej: ... this is described nicely in [Michael M. David, "Advanced ANSI SQL Data Modeling and Structure Processing", p 19](http://books.google.de/books?id=6aMJH43GwpIC&pg=PA18&lpg=PA18&dq=SQL+associativity+of+outer+joins&source=bl&ots=Mn5-6BUiJf&sig=ZOyMFP0t8ZsHsN-UEIX5LvrhnWM&hl=de&sa=X&ei=c8mHUrvGL8rEswa5nYGwCw&ved=0CDMQ6AEwAA#v=onepage&q=SQL%20associativity%20of%20outer%20joins&f=false) – Fabian Nov 17 '13 at 10:38
  • 2
    @Fabian I see what you mean, thanks for the link. In my answer, I focused specifically on the case presented by the author of the question, I didn't try to apply it to each left outer join case. Not sure if the author was interested only in that particular case, or he wanted a general answer. Anyway, as you said, it may be or may not be associative, it all depends on how the join is written. – Przemyslaw Kruglej Nov 17 '13 at 10:49
  • Your diagrams are pretending to be Venn diagrams, but they're not. Look at them. Try to write an absolutely clear & complete legend. What things are being encircled when? A typical two-circle Venn diagram illustrating outer & inner joins has left & right circles around *output rows* that have a part from the left or right input tables. So a diagram for a second call would not have the encircled rows of the first call being encircled. – philipxy Mar 01 '18 at 23:41
20

If you're assuming that you're JOINing on a foreign key, as your question seems to imply, then yes, I think OUTER JOIN is guaranteed to be associative, as covered by Przemyslaw Kruglej's answer.

However, given that you haven't actually specified the JOIN condition, the pedantically correct answer is that no, they're not guaranteed to be associative. There are two easy ways to violate associativity with perverse ON clauses.

1. One of the JOIN conditions involves columns from all 3 tables

This is a pretty cheap way to violate associativity, but strictly speaking nothing in your question forbade it. Using the column names suggested in your question, consider the following two queries:

-- This is legal
SELECT * FROM (A JOIN B ON A.b_id = B.id) 
              JOIN C ON (A.id = B.id) AND (B.id = C.id)


-- This is not legal
SELECT * FROM A
              JOIN (B JOIN C ON (A.id = B.id) AND (B.id = C.id))
              ON A.b_id = B.id

The bottom query isn't even a valid query, but the top one is. Clearly this violates associativity.

2. One of the JOIN conditions can be satisfied despite all fields from one table being NULL

This way, we can even have different numbers of rows in our result set depending upon the order of the JOINs. For example, let the condition for JOINing A on B be A.b_id = B.id, but the condition for JOINing B on C be B.id IS NULL.

Thus we get these two queries, with very different output:

SELECT * FROM (A LEFT OUTER JOIN B ON A.b_id = B.id) 
              LEFT OUTER JOIN C ON B.id IS NULL;


SELECT * FROM A 
              LEFT OUTER JOIN (B LEFT OUTER JOIN C ON B.id IS NULL)
              ON A.b_id = B.id;

You can see this in action here: http://sqlfiddle.com/#!9/d59139/1

Mark Amery
  • 143,130
  • 81
  • 406
  • 459
  • For the first example, doesn't the illegality of the second query arise from the fact that you are referencing `A.id` even though you don't have any columns from A yet? So even for an inner join the query should be illegal, right? – Tianxiang Xiong Nov 16 '13 at 20:05
  • @xiongtx Correct. As I said in the answer, it's a pretty cheap way to violate associativity. My second example I was prouder of. :) – Mark Amery Nov 16 '13 at 22:35
4

In addition to the previous answers: The topic is nicely discussed in Michael M. David, Advanced ANSI SQL Data Modeling and Structure Processing, Artech House, 1999, pages 19--21. Pages available online.

I find particularly noteworthy that he discusses that the table (LEFT JOIN ...) and join clauses (ON ... ) have to be considered separately, so associativity could refer to both (re-arranging of table clauses and re-arranging of join conditions, i.e., on clauses). So the notion of associativity is not the same as for, e.g., addition of numbers, it has two dimensions.

Fabian
  • 2,822
  • 1
  • 17
  • 22
  • I think the issue of moving the join predicate around is a very important one. The author make a good point that joins are not strictly binary operators. – Tianxiang Xiong Nov 17 '13 at 21:15
  • Or we can simplify via `on`s with the same special condition. Eg full outer natural joins all using the same set of common columns is associative (ignoring column order). – philipxy Mar 02 '18 at 00:09