1

I'm trying to join multiple tables together using a full outer join, its getting close to the proper result but there are some duplicate rows due to the join clauses. I've got several tables with columns of id, date, value. I'm looking to get a table with one row for each id, date pair that has all the values from each of the tables.

Here is a SQLFiddle if you want to play with it.

Here's what I've got so far:


SELECT
  COALESCE(T1.ID, T2.ID, T3.ID, t4.id) AS ID,
  COALESCE(T1.event_dt, T2.event_dt, T3.event_dt, t4.event_dt) AS DATE,
  T1.AMT1, T2.AMT2, T3.AMT3, t4.AMT4
FROM T1
FULL OUTER JOIN T2
ON
  T2.id = T1.id
  AND T2.event_dt = T1.event_dt
FULL OUTER JOIN T3
ON
  T3.id = T1.id
  AND T3.event_dt = T1.event_dt
FULL OUTER JOIN T4
ON
  T4.id = T1.id
  AND T4.event_dt = T1.event_dt
ORDER BY ID, DATE

This almost works, but I get some duplicate rows when for instance T4 has an ID,event_dt pair that is not in T1 (as to be expected because thats what I'm joining on). For instance, I'll get something like:


1   April, 06 2012 00:00:00+0000    (null)  2   (null)  (null)
1   April, 06 2012 00:00:00+0000    (null)  (null)  (null)  4
1   April, 06 2012 00:00:00+0000    (null)  (null)  3   (null)

When I'm looking to get:

1   April, 06 2012 00:00:00+0000    (null)   2   3   4

Is there a way to flatten/merge those rows together, or is there a better way to go about this altogether?

reptilicus
  • 10,290
  • 6
  • 55
  • 79
  • What's is the condition to group them? Because in your example you put just one value (2, 3 or 4) but in your fiddle you have multiple values, for e.g. for AMT1 you get 40 and 1. Is the Date field? So what is happen if you get the same date but multiple records? Did you think in normalize your table? – Leandro Bardelli May 10 '13 at 16:50
  • 1
    The condition to group them is by ID, event_dt pairs. So for every ID, event date I want all the values from T1, T2, T3, T4. The values I put in were just dummy values. – reptilicus May 10 '13 at 16:54
  • 1
    Is `(id,event_dt)` the primary key in all the tables? – ypercubeᵀᴹ May 10 '13 at 17:05
  • yup, (id, event_dt) is the pk – reptilicus May 10 '13 at 17:11
  • 1
    As far as "better way" is concerned, having four tables with seemingly identical predicates waves a red flag at *me*. I don't know your semantics, but that's suspicious. – Mike Sherrill 'Cat Recall' May 10 '13 at 17:12

4 Answers4

3

You could always use an aggregate around the amount columns:

SELECT
  COALESCE(T1.ID, T2.ID, T3.ID, t4.id) AS ID,
  COALESCE(T1.event_dt, T2.event_dt, T3.event_dt, t4.event_dt) AS DATE,
  max(coalesce(T1.AMT1, 0)) AMT1,  -- use coalesce to replace the null with zero
  max(coalesce(T2.AMT2, 0)) AMT2, 
  max(coalesce(T3.AMT3, 0)) AMT3, 
  max(coalesce(t4.AMT4, 0)) AMT4
FROM T1
FULL OUTER JOIN T2
  ON T2.id = T1.id
  AND T2.event_dt = T1.event_dt
FULL OUTER JOIN T3
  ON T3.id = T1.id
  AND T3.event_dt = T1.event_dt
FULL OUTER JOIN T4
  ON T4.id = T1.id
  AND T4.event_dt = T1.event_dt
group by  COALESCE(T1.ID, T2.ID, T3.ID, t4.id), 
  COALESCE(T1.event_dt, T2.event_dt, T3.event_dt, t4.event_dt)
ORDER BY ID, DATE;

See Demo

Taryn
  • 242,637
  • 56
  • 362
  • 405
  • If you don't trap for NULL before you do the MAX, it will not group correclty. – Declan_K May 10 '13 at 16:57
  • @Declan_K I added the coalesce, but it is not needed see this demo -- http://sqlfiddle.com/#!12/7993e/14 -- There is no grouping on the amount columns so it will not be incorrect – Taryn May 10 '13 at 17:05
  • You are right. Sorry for that. (What't the protcol here, is it OK for me to delete my stupid comment or should I leave it there?) – Declan_K May 10 '13 at 17:16
3

I think you join-citeria is simply not what you really want. This one should do the trick:

SELECT
  COALESCE(T1.ID, T2.ID, T3.ID, t4.id) AS ID,
  COALESCE(T1.event_dt, T2.event_dt, T3.event_dt, t4.event_dt) AS DATE,
  T1.AMT1, T2.AMT2, T3.AMT3, t4.AMT4
FROM T1
FULL OUTER JOIN T2
ON
  T2.id = T1.id
  AND T2.event_dt = T1.event_dt
FULL OUTER JOIN T3
ON
  T3.id = coalesce(T1.id, T2.id)
  AND T3.event_dt = coalesce(T1.event_dt, T2.event_dt)
FULL OUTER JOIN T4
ON
  T4.id = coalesce(T1.id, T2.id, T3.id)
  AND T4.event_dt = coalesce(T1.event_dt, T2.event_dt, T3.event_dt)
ORDER BY ID, DATE

SQL-Fiddle here gives you the desired output for 2012-04-06.

A.H.
  • 63,967
  • 15
  • 92
  • 126
2

Trap for NULLs, repalceing them with zeros, then find the MAX value in each column.

SELECT
  COALESCE(T1.ID, T2.ID, T3.ID, t4.id) AS ID,
  COALESCE(T1.event_dt, T2.event_dt, T3.event_dt, t4.event_dt) AS DATE,
  max( coalesce(T1.AMT1,0)) as amt1
, max( coalesce(T2.AMT2,0)) as amt2
, max( coalesce(T3.AMT3,0)) as amt3
, max( coalesce(t4.AMT4,0)) as amt4
FROM T1
FULL OUTER JOIN T2
ON
  T2.id = T1.id
  AND T2.event_dt = T1.event_dt
FULL OUTER JOIN T3
ON
  T3.id = T1.id
  AND T3.event_dt = T1.event_dt
FULL OUTER JOIN T4
ON
  T4.id = T1.id
  AND T4.event_dt = T1.event_dt
group by   COALESCE(T1.ID, T2.ID, T3.ID, t4.id),
  COALESCE(T1.event_dt, T2.event_dt, T3.event_dt, t4.event_dt)
ORDER BY ID, DATE

Here's the Fiddle

Declan_K
  • 6,726
  • 2
  • 19
  • 30
1

(assuming the OP wants a fully symmetric outer 4-join)

WITH four AS (
        SELECT id, event_dt FROM t1
        UNION
        SELECT id, event_dt FROM t2
        UNION
        SELECT id, event_dt FROM t3
        UNION
        SELECT id, event_dt FROM t4
        )
SELECT f.id, f.event_dt
        , t1.amt1
        , t2.amt2
        , t3.amt3
        , t4.amt4
FROM four f
LEFT JOIN t1 ON t1.id = f.id AND t1.event_dt = f.event_dt
LEFT JOIN t2 ON t2.id = f.id AND t2.event_dt = f.event_dt
LEFT JOIN t3 ON t3.id = f.id AND t3.event_dt = f.event_dt
LEFT JOIN t4 ON t4.id = f.id AND t4.event_dt = f.event_dt
ORDER BY id, event_dt
        ;

Result:

 id |  event_dt  | amt1 | amt2 | amt3 | amt4 
----+------------+------+------+------+------
  1 | 2012-04-01 |    1 |      |      |     
  1 | 2012-04-02 |    1 |      |    3 |     
  1 | 2012-04-03 |    1 |      |    3 |     
  1 | 2012-04-06 |      |    2 |    3 |    4
  1 | 2012-04-07 |      |    2 |      |     
  2 | 2012-04-01 |   40 |      |      |     
  2 | 2012-04-02 |      |      |    3 |     
  2 | 2012-04-03 |      |      |    3 |     
  2 | 2012-04-04 |   40 |      |      |     
(9 rows)

BTW: after the UNION four, LEFT JOINs will do the same as FULL JOINs here (union four already has all the possible {id, event_dt} pairs)

wildplasser
  • 43,142
  • 8
  • 66
  • 109
  • All of these ended up working, I went with this one because it saved me some typing. – reptilicus May 11 '13 at 01:51
  • This solution looks pretty, but if you care about performance you should use a full outer join (like the solution provided by @A.H.) With my data this WITH/UNION query takes 1070ms... with a full outer join I achieve the same result in 87ms. – Ryan Jan 19 '14 at 09:40
  • 1
    If performance is the issue, please see the comment by @Mike Sherrill catcall above: the data model is broken anyway. You cannot fix a bad data model by means of a "clever" query. Fix the model first. – wildplasser Jan 19 '14 at 10:24