This is unlike the selecting row with latest timestamp question and is specific to Redshift
I want to allow users to update parts of a (staging) table row at different points in time while avoiding invoking UPDATE
statements. This is done by an append-only approach where we keep adding rows where only the unique id and a timestamp is mandatory and the other columns may or may not have a value provided.
Question:
Given a table where apart from "primary key" (not truely enforced) and a timestamp column, all other columns in that table are nullable, how do I merge all rows that have the same primary key into one row by picking the most recent non-null value for each nullable columns, if one such non-null value exists.
Example:
|id|timestamp|status|stringcol|numcol|
|1 |456 |begin | | |
|1 |460 | | | 2 |
|2 |523 | | foo | |
|1 |599 |mid | blah | |
|2 |624 |begin | | |
|1 |721 |done | | 60 |
should produce
|id|timestamp|status|stringcol|numcol|
|2 |624 |begin | foo | |
|1 |721 |done | blah | 60 |