I have daily timeseries for companies in my dataset and use PostgreSQL.
For every company all rows with NULL in column3 shall be deleted until the first NOT NULL entry in this column for this company. Then all consecutive missing values are filled in with the value of the last observable value for this company that is NOT NULL.
You can imagine the following example data:
date company column3
1 2004-01-01 A 5
2 2004-01-01 B NULL
3 2004-01-01 C NULL
4 2004-01-02 A NULL
5 2004-01-02 B 7
6 2004-01-02 C NULL
7 2004-01-03 A 6
8 2004-01-03 B 7
9 2004-01-03 C 9
10 2004-01-04 A NULL
11 2004-01-04 B NULL
12 2004-01-04 C NULL
It would be great if I manage to write a query that delivers
date company column3
1 2004-01-01 A 5
2 2004-01-02 A 5
3 2004-01-02 B 7
4 2004-01-03 A 6
5 2004-01-03 B 7
6 2004-01-03 C 9
7 2004-01-04 A 6
8 2004-01-04 B 7
9 2004-01-04 C 9
I tried:
SELECT a.date, a.company, COALESCE(a.column3, (SELECT b.column3 FROM mytable b
WHERE b.company=a.company AND b.colmun3 IS NOT NULL ORDER BY b.company=a.company
DESC LIMIT 1)) FROM mytable a;
There are two problems with the code:
- It does not delete all records with NULL values until the first NOT NULL value, but
fills in all missing values. - ...with the first observation in the column and not with the last observation before
the missing value.