How to do FULL OUTER JOIN in MS SQL Server when columns don't share any parameter including primary key?

Question

I have many tables (let's call them single-parameter-tables) which include ID (primary key) and another parameter (2 columns in each table). I wish to join all of them in a joined_table consisting ID and param_1, param_2, ...., param_n columns. The joined_table is NOT NULL for ID column (primary key) and Nullable for other columns. When the parameters share the ID value, I can do the FULL OUTER JOIN normally and there's no problem. But when one parameter doesn't share primary key with any of the other parameters, I face a problem. Simply speaking, assume for ID 124 there is some value for param_3 from the third single_param-table but no other occurrence and value in other single-parameter-tables.

My code is as follows:

Insert into [joined_table]

(ID, param_1,param_2,param_3)

SELECT
       ID
      ,param1
      ,param2
      ,param3

FROM 
(
  SELECT 
        -- here if I write just "A.ID as ID" I will receive error of unfilled primary key column)
        COALESCE( A.ID, B.ID, C.ID) as ID 
        , A.param_1    as param1
        , B.param_2    as param2
        , C.param_3    as param3
  FROM 
     (
        (SELECT ID, param_1  FROM single_param_table_1) A

        FULL OUTER JOIN
        (SELECT IِِD, param_2  FROM single_param_table_2) B on A.ID= B.ID

        FULL OUTER JOIN 
        (SELECT ID, param_3 FROM single_param_table_3) C on A.ID = C.ID
        -- or:
        -- ISNULL(A.ID, B.ID)= C.ID
     )  

) as joined ;

The error message that I receive is as follows:

Violation of PRIMARY KEY constraint 'PK_joined_table'. Cannot insert duplicate key in object 'joined_table'.

It seems like parameter 3 is not completely separate from other parameters and in case it shares the key, repeated row is tried to be inserted into the table.

Ideally I wish to have the result joined_table as this:

 ID  |  param 1  |  param 2  |  param 3
=======================================
123  |    11     |    12     |   NULL
---------------------------------------
124  |   NULL    |   NULL    |    23

Select all possible IDs and left join that to your current query (where you should use grouping to exclude duplicates). — Arvo, May 25 '20 at 12:10
@Iraj . . . Your problem is not when an id is missing. Your problem is either that `joined_table` already has rows in it or that one or more tables have duplicates. — Gordon Linoff, May 25 '20 at 12:16
@Arvo Thank you for your reply. Left join helps to avoid the error, but then I won't have any entry related to the last column (the column/parameter which doesn't share any common primary key with the other parameters) — Iraj, May 25 '20 at 13:05
@GordonLinoff Thank you! You saved me. The problem was with one duplicated ID in the last single_param_table. I used distinct in my select statement and it is solved now. `select distinct coalesce(t1.id, t2.id, t3.id, t4.id, . . . ) as id` — Iraj, May 25 '20 at 13:22
Does this answer your question? [Strange duplicate behavior from GROUP\_CONCAT of two LEFT JOINs of GROUP\_BYs](https://stackoverflow.com/questions/45250646/strange-duplicate-behavior-from-group-concat-of-two-left-joins-of-group-bys) — philipxy, May 25 '20 at 20:08
@philipxy Thank you for your reply. Seems like an interesting solution but however my problem was solved by the method that Gordan suggested. — Iraj, May 26 '20 at 12:12
The link I gave addresses the problem with duplicates. Gordon refers to that aspect of your code: "Your problem is that one or more table have duplicates. In the meantime [...]" Gordon's answer then gives another problem with your code that is also a common duplicate question. [Is it true that using INNER JOIN after any OUTER JOIN will essentially invalidate the effects of OUTER JOIN?](https://stackoverflow.com/q/55094277/3404097) [Left Outer Join doesn't return all rows from my left table?](https://stackoverflow.com/q/4707673/3404097) — philipxy, May 26 '20 at 21:21
Please in code questions give a [mre]--cut & paste & runnable code, including smallest representative example input as code; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. Give the least code you can give that is code that you show is OK extended by code that you show is not OK. (Debugging fundamental.) For SQL that includes DBMS & DDL (including constraints & indexes) & input as code formatted as a table. [ask] Isolate the first expression that doesn't give what you expect & explain what you expected & why. — philipxy, May 26 '20 at 21:25

Gordon Linoff · Accepted Answer · 2020-05-25T12:17:32.707

1

Your problem is that one or more table have duplicates.

In the meantime, your FULL JOIN logic is filtering out rows that you seem to want. You can simplify and improve the logic:

select coalesce(t1.id, t2.id, t3.id, t4.id, . . . ) as id,
       t1.param as param1,
       t2.param as param2,
       t3.param as param3,
       t4.param as param4,
       . . .
from single_param_table_1 t1 full join
     single_param_table_2 t2
     on t2.id = t1.id full join
     single_param_table_3 t3
     on t3.id = coalesce(t1.id, t2.id) full join
     single_param_table_4 t4
     on t4.id = coalesce(t1.id, t2.id, t3.id) full join
     . . .

That is, you need lots of use of coalesce() so the ids match across the tables.

I should note that standard SQL and most databases support the using clause which simplifies this logic. However, SQL Server does not support using.

That simplifies your logic. However, your issue is that one or more tables have duplicate ids.

edited May 25 '20 at 12:17

answered May 25 '20 at 12:10

Gordon Linoff

1,242,037
58
646
786

Thank you for your reply. I implemented it and unfortunately still get the same error. If I use LEFT JOIN instead of FULL JOIN, I won't get error but the result joined_table will only contain the first parameters when the last one is NULL and no row with value for the last parameter. – Iraj May 25 '20 at 13:02
Update: I added "distinct" before selecting IDs and the problem is solved! `select distinct coalesce(t1.id, t2.id, t3.id, t4.id, . . . ) as id`, Thank you. – Iraj May 25 '20 at 13:17
1

@Iraj . . . You have duplicates in the underlying tables. If `select distinct` works, then the duplicates have the same value for the other column. – Gordon Linoff May 25 '20 at 21:25
Thank you. Yes I have duplicate in one of the underlying tables. "Distinct" helps to skip the duplicate. It probably only chooses the first entry by that duplicated ID. Is there any problem with this? I'm fine if it drops the duplicated ID (whatever the values of other parameters are) – Iraj May 26 '20 at 12:08
@Iraj . . . No problem -- except you should probably figure out why there are duplicates and fix that if it is an issue. My comment is that not only is the `id` duplicated, but `id`/`param` is duplicated. – Gordon Linoff May 26 '20 at 12:36

How to do FULL OUTER JOIN in MS SQL Server when columns don't share any parameter including primary key?

1 Answers1