Retrieve last known value for each column of a row

Question

Not sure about the correct words to ask this question, so I will break it down.

I have a table as follows:

date_time | a | b | c

Last 4 rows:

15/10/2013 11:45:00 | null   | 'timtim' | 'fred'
15/10/2013 13:00:00 | 'tune' | 'reco'   | null
16/10/2013 12:00:00 | 'abc'  | null     | null
16/10/2013 13:00:00 | null   | 'died'   | null

How would I get the last record but with the value ignoring the null and instead get the value from the previous record.

In my provided example the row returned would be

16/10/2013 13:00:00 | 'abc' | 'died' | 'fred'

As you can see if the value for a column is null then it goes to the last record which has a value for that column and uses that value.

This should be possible, I just cant figure it out. So far I have only come up with:

select 
    last_value(a) over w a
from test
WINDOW w AS (
    partition by a
    ORDER BY ts asc
    range between current row and unbounded following
    );

But this only caters for a single column ...

last data is the order that was inserted or the max datetime? — Jorge Campos, Nov 27 '13 at 15:39
There is no natural order in a table. A table is a set without order. The `"last data is the order that was inserted"` is not defined as long as you do not store that information somewhere. CTID is *not reliable*. It can change any time with any update or restore. — Erwin Brandstetter, Nov 27 '13 at 22:11

Erwin Brandstetter · Answer 1 · 2021-11-05T03:05:00.600

Order of rows

The "last row" and the sort order would need to be defined unambiguously. There is no natural order in a set (or a table). I am assuming ORDER BY ts, where ts is the timestamp column.
Like @Jorge pointed out in his comment: If ts is not UNIQUE, one needs to define tiebreakers for the sort order to make it unambiguous (add more items to ORDER BY). A primary key would be the ultimate solution.

General solution with window functions

To get a result for every row:

SELECT ts
     , max(a) OVER (PARTITION BY grp_a) AS a
     , max(b) OVER (PARTITION BY grp_b) AS b
     , max(c) OVER (PARTITION BY grp_c) AS c
FROM (
   SELECT *
        , count(a) OVER (ORDER BY ts) AS grp_a
        , count(b) OVER (ORDER BY ts) AS grp_b
        , count(c) OVER (ORDER BY ts) AS grp_c
   FROM t
   ) sub;

How?

The aggregate function count() ignores NULL values when counting. Used as aggregate-window function, it computes the running count of a column according to the default window definition, which is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. NULL values don't increase the count, so these rows fall into the same peer group as the last non-null value.
In a second window function, the only non-null value per group is easily extracted with max() or min().

Just the last row

WITH cte AS (
   SELECT *
        , count(a) OVER w AS grp_a
        , count(b) OVER w AS grp_b
        , count(c) OVER w AS grp_c
   FROM   t
   WINDOW w AS (ORDER BY ts)
   ) 
SELECT ts
     , max(a) OVER (PARTITION BY grp_a) AS a
     , max(b) OVER (PARTITION BY grp_b) AS b
     , max(c) OVER (PARTITION BY grp_c) AS c
FROM   cte
ORDER  BY ts DESC
LIMIT  1;

Simple alternatives for just the last row

SELECT ts
      ,COALESCE(a, (SELECT a FROM t WHERE a IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS a
      ,COALESCE(b, (SELECT b FROM t WHERE b IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS b
      ,COALESCE(c, (SELECT c FROM t WHERE c IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS c
FROM   t
ORDER  BY ts DESC
LIMIT  1;

Or:

SELECT (SELECT ts FROM t                     ORDER BY ts DESC LIMIT 1) AS ts
      ,(SELECT a  FROM t WHERE a IS NOT NULL ORDER BY ts DESC LIMIT 1) AS a
      ,(SELECT b  FROM t WHERE b IS NOT NULL ORDER BY ts DESC LIMIT 1) AS b
      ,(SELECT c  FROM t WHERE c IS NOT NULL ORDER BY ts DESC LIMIT 1) AS c

db<>fiddle here
_{Old sqlfiddle}

Performance

While this should be decently fast, if performance is your paramount requirement, consider a plpgsql function. Start with the last row and loop descending until you have a non-null value for every column required. Along these lines:

GROUP BY and aggregate sequential numeric values

Your answer doesn't work properly. see this test: http://sqlfiddle.com/#!12/4ba24/1 — Jorge Campos, Nov 27 '13 at 22:02
@JorgeCampos: Sure it does. Your example only demonstrates that the ordering by the timestamp column `ts` alone is *not well defined* if ts is not `UNIQUE`. The question is unclear in this respect. There is no natural order in a table of an RDBMS like you seem to assume. Ordering by CTID is not unreliable. BTW, I added simple versions without window functions. — Erwin Brandstetter, Nov 27 '13 at 22:08
I think you missunderstood me, I love your answer, is by far more elegant then my own. I just pointed this out because the discussion with the OP on the comments. I just pointed, so you could fix. If you fix this I would love to see the OP unmark my answer and mark yours. As i said my answer is an uggly solution. :) — Jorge Campos, Nov 27 '13 at 22:12
@JorgeCampos: Well, thanks for your fiddle. I added a paragraph to clarify the matter of the sort order in response to that. — Erwin Brandstetter, Nov 27 '13 at 22:24

Matthew Plourde · Answer 2 · 2013-11-27T16:55:36.390

1

Here I create an aggregation function that collects columns into arrays. Then it is just a matter of removing the NULLs and selecting the last element from each array.

Sample Data

CREATE TABLE T (
    date_time timestamp,
    a text,
    b text,
    c text
);

INSERT INTO T VALUES ('2013-10-15 11:45:00', NULL, 'timtim', 'fred'),
('2013-10-15 13:00:00', 'tune', 'reco', NULL  ),
('2013-10-16 12:00:00', 'abc', NULL, NULL     ),
('2013-10-16 13:00:00', NULL, 'died', NULL    );

Solution

CREATE AGGREGATE array_accum (anyelement)
(
    sfunc = array_append,
    stype = anyarray,
    initcond = '{}'
);

WITH latest_nonull AS (
    SELECT MAX(date_time) As MaxDateTime, 
           array_remove(array_accum(a), NULL) AS A, 
           array_remove(array_accum(b), NULL) AS B, 
           array_remove(array_accum(c), NULL) AS C
    FROM T
    ORDER BY date_time
)
SELECT MaxDateTime, A[array_upper(A, 1)], B[array_upper(B,1)], C[array_upper(C,1)]
FROM latest_nonull;

Result

     maxdatetime     |  a  |  b   |  c
---------------------+-----+------+------
 2013-10-16 13:00:00 | abc | died | fred
(1 row)

edited Nov 27 '13 at 16:55

answered Nov 27 '13 at 16:16

Matthew Plourde

43,932
7
96
113

would work if the last registry have the date say 15/10/2013 11:45:00? – Jorge Campos Nov 27 '13 at 16:22
sorry but im new to postgres, when i try to create the aggregate i get the error ERROR: function array_prepend(anyarray, anyelement) does not exist SQL state: 42883 – cghrmauritius Nov 27 '13 at 16:26
@JorgeCampos no, I didn't see the comments under the question. I'll fix it. This assumes the last insert is the latest timestamp. – Matthew Plourde Nov 27 '13 at 16:27
@cghrmauritius That was a typo, if you look at the code now it has `array_append` instead of `array_prepend`) – Matthew Plourde Nov 27 '13 at 16:28
ok seem to be getting somewhere, i've managed to create the aggregate thanks, but the with iss throwing ERROR: function array_remove(text[], unknown) does not exist SQL state: 42883 Hint: No function matches the given name and argument types. You might need to add explicit type casts. Character: 72 – cghrmauritius Nov 27 '13 at 16:30
@JorgeCampos actually, I'm not sure you can. Maybe someone can correct me, but unless OP is keeping track of the order of inserts with an additional column, there's no way to sort on insert time. – Matthew Plourde Nov 27 '13 at 16:31
@cghrmauritius what version of Postgresql are you using? – Matthew Plourde Nov 27 '13 at 16:32
current version is 9.2 – cghrmauritius Nov 27 '13 at 16:33
9.3 is the latest. Looks like 9.2 doesn't have `array_remove`. – Matthew Plourde Nov 27 '13 at 16:35
You can if you add in your query the CTID, i was trying to solve this with unions using max(CTID) and limit 1 – Jorge Campos Nov 27 '13 at 16:35
@JorgeCampos Looks like CTID changes when a row updates. This would work if OP only performs INSERTs and DELETEs on this table. – Matthew Plourde Nov 27 '13 at 16:38
Yeah I know, but if he just want the final result it will fit. just for a sql result. I will post my attempt as answer. – Jorge Campos Nov 27 '13 at 16:43
hi it is just the final result im interested in, its a case where a certain portion of the desktop application shows the latest data held on a patient. – cghrmauritius Nov 27 '13 at 16:50
@cghrmauritius Could you have a situation like JorgeCampos describes, where the last entry inserted doesn't have the latest timestamp? – Matthew Plourde Nov 27 '13 at 16:53
Ok, then this is the solution you want, if you can update to 9.3. Notice I just made a change, adding a missing `ORDER BY` clause. – Matthew Plourde Nov 27 '13 at 16:56
@MatthewPlourde i wish i know how to update to 9.3 i have no idea how to and i'm on a live centos 6 box, can it be done with yum??? – cghrmauritius Nov 27 '13 at 16:59
Yeah, I'm fairly certain it can. – Matthew Plourde Nov 27 '13 at 17:00

score 0 · Accepted Answer · answered Nov 27 '13 at 16:44

This should work but keep in mind it is an uggly solution

select * from
(select dt from
(select rank() over (order by ctid desc) idx, dt
  from sometable ) cx
where idx = 1) dtz,
(
select a from
(select rank() over (order by ctid desc) idx, a
  from sometable where a is not null ) ax 
where idx = 1) az,
(
select b from
(select rank() over (order by ctid desc) idx, b
  from sometable where b is not null ) bx 
where idx = 1) bz,
(
select c from
(select rank() over (order by ctid desc) idx, c
  from sometable where c is not null ) cx
where idx = 1) cz

See it here at fiddle: http://sqlfiddle.com/#!15/d5940/40

The result will be

DT                                   A        B      C
October, 16 2013 00:00:00+0000      abc     died    fred

This builds on a "natural order" that does not exist. Also, you wouldn't need window functions for this approach at all. Consider the "simple alternatives" in my answer. — Erwin Brandstetter, Nov 27 '13 at 22:13