Why does FULL JOIN order make a difference in these queries?

Question

I'm using PostgreSQL. Everything I read here suggests that in a query using nothing but full joins on a single column, the order of tables joined basically doesn't matter.

My intuition says this should also go for multiple columns, so long as every common column is listed in the query where possible (that is, wherever both joined tables have the column in common). But this is not the case, and I'm trying to figure out why.

Simplified to three tables a, b, and c.

Columns in table a: id, name_a
Columns in table b: id, id_x
Columns in table c: id, id_x

This query:

SELECT *
FROM a
    FULL JOIN b USING(id)
    FULL JOIN c USING(id, id_x);

returns a different number of rows than this one:

SELECT *
FROM a
    FULL JOIN c USING(id)
    FULL JOIN b USING(id, id_x);

What I want/expect is hard to articulate, but basically, a I'd like a "complete" full merger. I want no null fields anywhere unless that is unavoidable.

For example, whenever there is a not-null id, I want the corresponding name column to always have the name_a and not be null. Instead, one of those example queries returns semi-redundant results, with one row having a name_a but no id, and another having an id but no name_a, rather than a single merged row.

When the joins are listed in the other order, I do get that desired result (but I'm not sure what other problems might occur, because future data is unknown).

Are you expecting there could be NULL in the input tables? Do some of your expectations depend on FULL JOIN on superkeys (UNIQUE NOT NULL)? Are there any other restrictions/constraints on the the tables or the joins? What exactly are your desired/expected results in terms of the inputs? Please give example input, query, and expected/desired & actual output. PS I suspect no chain of such joins is the query for the result you want. I suspect you need certain unions, and that OR plays a large role in your spec. Try to describe & give examples for what you want for 2 then 3 tables. — philipxy, Jul 23 '17 at 12:13
[Is there any rule of thumb to construct SQL query from a human-readable description?](https://stackoverflow.com/a/33952141/3404097) — philipxy, Jul 23 '17 at 12:24
You are still not clear about what you want in the comments you wrote on GordonLinoff's answer. Also, "If there are multiple rows with..." there suggests, as I hypothesized in my comment on your question, that your situation involves certain assumptions *but you have not given them*. Things are more complicated than you think. If you want an answer to your question, whatever it is, please read & act on [mcve], then read & act on my comment. At the very least, give an examle of input & expected output. Also, comments are not for clarifications, please edit your question. Also, see my answer. — philipxy, Jul 30 '17 at 08:25
Here is how to "articulate": Please (1) give PKs, UNIQUEs, NOT NULLs, FKs & any other restrictions on what your tables can hold at one time (whether declared or not) & (2) finish this sentence (assuming "∈" is like SQL "in" but matches columns on same value or both null): I want rows with columns (id,id_x,name) where [(id,id_x) ∈ (b union c) and ((id,name) in a or (id) not in (select id from a) and name is null)] or ... . Maybe ... or (id) not in (select id from (b union c)) and (id,name) ∈ a and id_x is null. — philipxy, Aug 18 '17 at 00:24

score 1 · Answer 1 · answered Jul 19 '17 at 18:49

1

Your queries are different.

In the first, you are doing a full join to b using a single column, id.

In the second, you are doing a full join to b using two columns.

Although the two queries could return the same results under some circumstances, there is not reason to think that the results would be comparable.

answered Jul 19 '17 at 18:49

Gordon Linoff

1,242,037
58
646
786

That makes sense. But what I don't entirely get is how I can force the last table's `id` to always be fused with the first one's, as much as with the second's. Intuitively that's how I expect `FULL JOIN` to work when I have the `USING` keyword, because only one `id` column is returned. – Lenoxus Jul 19 '17 at 19:37
1

@Lenoxus What does "always be fused" mean? PS Definitions, not intuitions. – philipxy Jul 23 '17 at 11:49
I understand why it's unclear. I want the results table to have as few rows as possible. So: no null values except where a value would be truly indeterminable. If there are multiple rows with, say, the same id and id_x values, this must be because both tables b and c have rows with those values. The result table should reflect every relationship/overlap in those tables' data, but no more than that. – Lenoxus Jul 24 '17 at 13:01

philipxy · Answer 2 · 2017-07-23T15:18:47.563

Argument order matters in OUTER JOINs, except that FULL NATURAL JOIN is symmetric. They return what an INNER JOIN (ON, USING or NATURAL) does but also the unmatched rows from the left (LEFT JOIN), right (RIGHT JOIN) or both (FULL JOIN) tables extended by NULLs.

USING returns the single shared value for each specified column in INNER JOIN rows; in NULL-extended rows another common column can have NULL in one table's version and a value in the other's.

Join order matters too. Even FULL NATURAL JOIN is not associative, since with multiple tables each pair of tables (either operand being an original or join result) can have a unique set of common columns, ie in general (A ⟗ B) ⟗ C ≠ A ⟗ (B ⟗ C).

There are a lot of special cases where certain additional identities hold. Eg FULL JOIN USING all common column names and OUTER JOIN ON equality of same-named columns are symmetric. Some cases involve CKs (candidate keys), FKs (foreign keys) and other constraints on arguments.

Your question doesn't make clear exactly what input conditions you are assuming or what output conditions you are seeking.

Why does FULL JOIN order make a difference in these queries?

2 Answers2