Odd FULL OUTER JOIN "ON" Clause?

Question

I'm running into a slightly confusing issue.

Simplified query: (Assume ID is primary key in each table)

SELECT
A.ID,
A.Data,
B.Data,
C.Data
FROM
A FULL OUTER JOIN
B ON A.ID = B.ID FULL OUTER JOIN
C ON A.ID = C.ID

I'm using FULL OUTER JOIN's because in my situation there is no guarantee that any ID is in all three tables, but if it is in more than one table, I want to see all the data for it in one row.

Here's the problem that I quickly realized: If there is an ID that is in both tables B and C (but not A), then you run into the following issues:

The ID field is NULL for ID's that don't appear in table A. This makes sense, since the query selects A.ID. I found a pretty easy way around this, which is to use COALESCE (i.e. COALESCE(A.ID,B.ID,C.ID)).
The data for ID's that aren't in table A are returned in two separate rows. (One row has NULL's for B's data and the other has NULL's for C's data.) After thinking about it, this also makes sense because of the way the query above is written. Both tables B and C join based off of table A, so if the ID isn't in table A, then the query has no relationship to join that ID in tables B and C. I found a way around this as well, which is to explicitly specify the relationship to every table before it in the ON clause, separated by OR's.

So making the following changes will fix these two problems:

SELECT
COALESCE(A.ID,B.ID,C.ID),
A.Data,
B.Data,
C.Data
FROM
A FULL OUTER JOIN
B ON A.ID = B.ID FULL OUTER JOIN
C ON A.ID = C.ID OR B.ID = C.ID

This works fine, but it took me some time to figure this out, and future personnel who run across this query might it strange, because using COALESCE and a bunch of OR's in the ON clause at first glance seems superfluous, but actually both are needed.

This can also get very confusing for larger queries, because the size of the ON clause is compounded for each table that joins this way.

My question is: Is there some other built-in way or other trick to deal with OUTER JOIN's of this type that already take into account these extra conditions that you don't need to consider for INNER JOIN's?

No, there is no "built-in way". No, we should not be seeking "tricks". And what you describe is a consequence of outer joins generally - not just full joins. IME the use of is generally rare but I do see many odd or "long winded" or "difficult to write/explain" queries that are the result of a compromised schema. Here one can argue that you have 3 different entities (tables) the represent the same "thing" - a normalization issue. — SMor, Jul 14 '21 at 17:59
@SMor Thanks, I wasn't necessarily looking for a "trick", I'm OK with the current way, just was wondering if there was another way because I found this strategy particularly unusual. But yes, I agree that this type of query is generally rare and in this case the three tables could be the same "thing". Unfortunately in this situation there isn't a "master" table that contains a list of all ID's which I realize is usually the case. This is the first situation I've encountered where that isn't the case. — ImaginaryHuman072889, Jul 14 '21 at 18:05
Yes - sometimes the schema forces our hand when writing queries to generate useful information. But as I wrote, this is a product of outer joins generally and is something that the query writer needs to understand and address. — SMor, Jul 14 '21 at 18:21
I believe Another way to do it is join A->B, A->C, B->C group by the key's of each table and select hte max values from each table. duplicates would be eliminated per key via aggregration.. I'd need sample data to test it out.. or to perhaps PIVOT the data based on the Key's instead.... no key, no pivoted data; but the columns would still pivot... so long as at least 1 table had 1 record. — xQbert, Jul 14 '21 at 20:25
Yes there is another way - `UNION ALL` the three tables. Outside the UNION ALL, group by the keys and use various aggregate functions to boil 1-3 rows down to 1. This only works well if the full list of columns you're interested in match across all tables — Nick.Mc, Jul 14 '21 at 23:07

score 1 · Answer 1 · answered Jul 14 '21 at 20:36

Imagine a cross tab query, or a pivot doing this:

Here's an example in excel.

This way instead of the joins, you're pivoting on your ID column and it's irrelevant if data exists or not in other tables; so long as 1 table has data the values from that table will show.

Here's a visual of what I mean using Excel:

SO ID is your ID in each table. Table are the 3 tables involved. Data is the value associated to each ID. By pivoting the data you can see table A has values in all 3 records while b only has it on records 2 and 3 while c only has it on records 1 and 2. Using a Dynamic Pivot in SQL Server, I believe you could accomplish teh same thing. However this requires the use of Dynamic SQL: Example: SQL Server dynamic PIVOT query?

score 1 · Accepted Answer · answered Jul 14 '21 at 23:15

This is another way. It's not necessarily more or less complicated or performant. You'd need to check for your case. As already stated, having to do this may indicate a modelling issue.

SELECT ID, MAX(Data1), MAX(Data2), MAX(Data3)
FROM
(
SELECT A.ID, A.Data1, NULL, NULL
FROM A
UNION ALL
SELECT B.ID, NULL, B.Data2, NULL
FROM B
UNION ALL
SELECT C.ID, NULL, NULL, C.Data3
FROM C
) T
GROUP BY ID

score 0 · Answer 3 · answered Jul 14 '21 at 22:48

I think this will do what you want:

FROM A FULL OUTER JOIN B 
  ON A.ID = B.ID 
OUTER JOIN C ON A.ID = C.ID or B.ID = C.ID

That gives you all rows in A and B, and anything in C that matches either one of them.

if it is in more than one table, I want to see all the data

To exclude cases where ID appears in only one table, add a bunch of WHERE clauses specifying the allowed combinations,

where A.ID is not NULL and B.ID is not NULL 
   or B.ID is not NULL and C.ID is not NULL 
   or A.ID is not NULL and C.ID is not NULL

score 0 · Answer 4 · answered Jul 15 '21 at 04:00

thanks to @xQbert for the data.

You can achieve the same using CTE (with unique IDs list) and then going for CROSS APPLY.

DECLARE @a table(id int, data int)
DECLARE @b table(id int, data int)
DECLARE @c table(id int, data int)

insert into @a values(1, 9), (2, 8),(3,7)
insert into @b values(2, 6), (3, 5)
insert into @c values(1, 4), (2, 5)

;WITH CTE_IDs as
(
SELECT ID from @a
UNION
SELECT ID FROM @b
UNION
SELECT ID FROM @c
)
SELECT c.ID,t.* FROM CTE_IDs as c
cross apply
(
VALUES((select data from @a where id = c.id),
(select data from @b where id = c.id),
(select data from @c where id = c.id)) 
) as t(a_data,b_data,c_data)

ID	a_data	b_data	c_data
1	9	NULL	4
2	8	6	5
3	7	5	NULL

Odd FULL OUTER JOIN "ON" Clause?

4 Answers4