1

I have a main table and some sub-tables that contain at least 1 column from the main table. Sub-tables are updates of some part of main table. I would like to get updated row of main table at a specific date.

Main table:

table1
| id | colA | colB | colC | colD | colE | createDate  |
|:---|:----:|:----:|:----:|:----:|:----:|:-----------:|
| a1 |  1   |  1   |  1   |  1   |  1   |  2017/01/01 |

Sub-tables :

table2
| mainid | colA | colB | createdate  |
|:------:|:----:|:----:|:-----------:|
|   a1   |  2   |  2   |  2018/05/01 |
|   a1   |  3   |  3   |  2019/01/01 |
|   a1   |  4   |  4   |  2020/01/01 |

table3
| mainid | colA | colB | colC | createDate  |
|:------:|:----:|:----:|:----:|:-----------:|
|   a1   |  6   |  6   |  6   |  2019/01/01 |
|   a1   |  7   |  7   |  7   |  2020/01/01 |
|   a1   |  8   |  8   |  8   |  2021/01/01 |

table4
| mainid | colA | colE | colC | createDate  |
|:------:|:----:|:----:|:----:|:-----------:|
|   a1   |  9   |  9   |  9   |  2018/06/01 |
|   a1   |  10  |  10  |  10  |  2017/01/01 |
|   a1   |  12  |  12  |  12  |  2020/01/01 |

I get rows from each table by following code:

select * from table2 where createDate < '2018-07-01' and mainid='a1' order by createDate desc limit 1;
select * from table3 where createDate < '2018-07-01' and mainid='a1' order by createDate desc limit 1;
select * from table4 where createDate < '2018-07-01' and mainid='a1 'order by createDate desc limit 1;

select * from table1 where id = 'a1'; 

Now I want to combine these rows with the main table's row. If there are multiple values from different tables for 1 specific column, it should use the latest row like this:

table1 -> colD: 1
table2 -> colB: 2
table3 -> nothing
table4 -> colA: 9, colC: 9, colE: 9

 selected row :
| id | colA | colB | colC | colD | colE |filteredDate |
|:---|:----:|:----:|:----:|:----:|:----:|:-----------:|
| a1 |  9   |  2   |  9   |  1   |  9   |  2018/07/01 |

How can I get this done in one query? Is this possible? Should I try it in a different way?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228

2 Answers2

0

Assuming all columns to be NOT NULL or you'll have to do more.

For a start you can UNION ALL the queries you listed and fill in NULL values for missing columns to arrive at a compatible row type. Then aggregate. The remaining difficulty is that the perfect aggregate function for the task is not implemented in stock Postgres ...

With standard, stock SQL tools

SELECT id
    , (array_agg(colA ORDER BY colA IS NULL, createDate DESC))[1] AS colA
    , (array_agg(colB ORDER BY colB IS NULL, createDate DESC))[1] AS colB
    , (array_agg(colC ORDER BY colC IS NULL, createDate DESC))[1] AS colC
    , (array_agg(colD ORDER BY colD IS NULL, createDate DESC))[1] AS colD
    , (array_agg(colE ORDER BY colE IS NULL, createDate DESC))[1] AS colE
FROM (
   select      id, colA, colB, colC, colD, colE, createDate from table1 where id = 'a1'
   UNION ALL
   (select mainid, colA, colB, NULL, NULL, NULL, createDate from table2 where createDate < '2018-07-01' and mainid='a1' order by createDate desc limit 1)
   UNION ALL
   (select mainid, colA, colB, colC, NULL, NULL, createDate from table3 where createDate < '2018-07-01' and mainid='a1' order by createDate desc limit 1)
   UNION ALL
   (select mainid, colA, NULL, colc, NULL, colE, createDate from table4 where createDate < '2018-07-01' and mainid='a1' order by createDate desc limit 1)
   ) sub
GROUP BY 1;

With custom aggregate function first()

Simpler and faster with the help of a custom aggregate function like instructed in the Postgres Wiki here:

CREATE OR REPLACE FUNCTION first_agg ( anyelement, anyelement )
RETURNS anyelement LANGUAGE SQL IMMUTABLE STRICT AS 'SELECT $1';
 
CREATE AGGREGATE FIRST (
        sfunc    = first_agg,
        basetype = anyelement,
        stype    = anyelement
);

Then:

SELECT id
     , first(colA) AS colA
     , first(colB) AS colB
     , first(colC) AS colC
     , first(colD) AS colD
     , first(colE) AS colE
FROM (
   SELECT      id, colA, colB, colC, colD, colE, createDate FROM table1 WHERE     id='a1'
   UNION ALL
   (SELECT mainid, colA, colB, NULL, NULL, NULL, createDate FROM table2 WHERE mainid='a1' AND createDate < '2018-07-01' ORDER BY createDate DESC LIMIT 1)
   UNION ALL
   (SELECT mainid, colA, colB, colC, NULL, NULL, createDate FROM table3 WHERE mainid='a1' AND createDate < '2018-07-01' ORDER BY createDate DESC LIMIT 1)
   UNION ALL
   (SELECT mainid, colA, NULL, colc, NULL, colE, createDate FROM table4 WHERE mainid='a1' AND createDate < '2018-07-01' ORDER BY createDate DESC LIMIT 1)
   ORDER BY createDate DESC
   ) sub
GROUP  BY 1;

Faster yet with a C implementation as provided by this additional module.

db<>fiddle here

Related, with more details and options:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

first_value from manual also works.

select distinct on(1) id,
       first_value(cola) over 
           (partition by id order by cola is null, createDate desc) cola_firstvalue,
       first_value(colb) over
           (partition by id order by colb is null, createDate desc) colb_firstvalue,
       first_value(colc) over
           (partition by id order by colc is null, createDate desc) colc_firstvalue,
       first_value(cold) over 
           (partition by id order by cold is null, createDate desc) cold_firstvalue,
       first_value(cole) over 
           (partition by id order by cole is null, createDate desc) cole_firstvalue
from
(select id, cola, colb, colc, cold, cole, createDate 
    from table1 where id = 'a1'
union all
select mainid, cola, colb, null,null,null,createdate 
    from table2 where mainid = 'a1' and createdate < '2018-07-01'
union all
select mainid, cola, colb, colc, null,null, createDate 
    from table3 where mainid = 'a1' and createdate < '2018-07-01'
union all
select mainid, cola, null, colc, null, cole, createdate 
    from table4 where mainid = 'a1' and createdate < '2018-07-01')
sub;
jian
  • 4,119
  • 1
  • 17
  • 32