Redshift Merge rows and conflict-resolve by timestamp

Question

This is unlike the selecting row with latest timestamp question and is specific to Redshift

I want to allow users to update parts of a (staging) table row at different points in time while avoiding invoking UPDATE statements. This is done by an append-only approach where we keep adding rows where only the unique id and a timestamp is mandatory and the other columns may or may not have a value provided.

Question:

Given a table where apart from "primary key" (not truely enforced) and a timestamp column, all other columns in that table are nullable, how do I merge all rows that have the same primary key into one row by picking the most recent non-null value for each nullable columns, if one such non-null value exists.

Example:

|id|timestamp|status|stringcol|numcol|
|1 |456      |begin |         |      |
|1 |460      |      |         |  2   |
|2 |523      |      |  foo    |      |
|1 |599      |mid   |  blah   |      |
|2 |624      |begin |         |      |
|1 |721      |done  |         |  60  |

should produce

|id|timestamp|status|stringcol|numcol|
|2 |624      |begin |  foo    |      |
|1 |721      |done  |  blah   |  60  |

There is a `DISTINCT ON` feature in PostgreSQL that does exactly this! However, it is not available in Amazon Redshift. Here is something that claims to mimic that behaviour: [DISTINCT ON like functionality for Redshift](https://gist.github.com/jmindek/62c50dd766556b7b16d6) — John Rotenstein, May 06 '19 at 21:10

Lim · Answer 1 · 2019-05-07T23:04:10.253

1

This can be achieved using Redshift's LISTAGG function combined with SPLIT_PART function.

LISTAGG concatenates all values in a group into a single string, optionally allow you to order to contatenation and provide a delimiter.
SPLIT_PART splits a string by a delimiter and returns the chosen part

Using the above example 5-column table, you would need something like this:

SELECT id, 
       MAX(last_updated),
       SPLIT_PART(LISTAGG(status, ',') WITHIN GROUP(ORDER BY last_updated DESC), ',', 1),
       SPLIT_PART(LISTAGG(stringcol, ',') WITHIN GROUP(ORDER BY last_updated DESC), ',', 1),
       SPLIT_PART(LISTAGG(numcol, ',') WITHIN GROUP(ORDER BY last_updated DESC), ',', 1)
FROM table
GROUP BY 1;

edited May 07 '19 at 23:04

answered May 06 '19 at 19:08

Lim

329
2
15

1

The above query needs `last_updated` be part of group by too – demircioglu May 06 '19 at 20:39
Thanks @mdem7 it was a typo but actually on the last_updated instead of the `group by` – Lim May 07 '19 at 23:05

Redshift Merge rows and conflict-resolve by timestamp

1 Answers1