3

Is the data imputation method Last Observation Carried Forward (LOCF) implemented in PostgreSQL?
If not, how could I implement this method?

TmTron
  • 17,012
  • 10
  • 94
  • 142
Hello lad
  • 17,344
  • 46
  • 127
  • 200

2 Answers2

6

The following code assumes a table tbl with columns a, b (keys), t (time) and v (value to locf impute):

create or replace function locf_s(a float, b float)
returns float
language sql
as '
  select coalesce(b, a)
';

drop aggregate if exists locf(float);
CREATE AGGREGATE locf(FLOAT) (
  SFUNC = locf_s,
  STYPE = FLOAT
);

select a,b,t,v,
    locf(v) over (PARTITION by a,b ORDER by t) as v_locf
from tbl
order by a,b,t
;

(SQLFiddle)


For a tutorial: "LOCF and Linear Imputation with PostgreSQL"

Raffael
  • 19,547
  • 15
  • 82
  • 160
  • instead of `float` we can use the [anyelement pseudo-type](https://www.postgresql.org/docs/current/datatype-pseudo.html) to make the function more flexible – TmTron Oct 13 '22 at 08:48
2

I based this table and data directly on the table in the linked article.

create table test (
  unit integer not null
    check (unit >= 1),
  obs_time integer not null
    check (obs_time >= 1),
  obs_value numeric(5, 1),
  primary key (unit, obs_time)
);

insert into test values
(1, 1, 3.8), (1, 2, 3.1), (1, 3, 2.0),
(2, 1, 4.1), (2, 2, 3.5), (2, 3, 3.8), (2, 4, 2.4), (2, 5, 2.8), (2, 6, 3.0),
(3, 1, 2.7), (3, 2, 2.4), (3, 3, 2.9), (3, 4, 3.5);

For the six observations in the linked article we need all the possible combinations of "unit" and "obs_time".

select distinct unit, times.obs_time 
from test
cross join (select generate_series(1, 6) obs_time) times;
unit  obs_time
--
1     1
1     2
1     3
1     4
1     5
1     6
2     1
. . .
3     6

We also need to know which row has the last observed value in it for each unit.

select unit, max(obs_time) obs_time
from test
group by unit
order by unit;
unit  obs_time
--
1     3
2     6
3     4

Knowing those two sets, we can join and coalesce to get the last observation and carry it forward.

with unit_times as (
  select distinct unit, times.obs_time 
  from test
  cross join (select generate_series(1, 6) obs_time) times
), last_obs_time as (
  select unit, max(obs_time) obs_time
  from test
  group by unit
)
select t1.unit, t1.obs_time, 
       coalesce(t2.obs_value, (select obs_value 
                               from test 
                               inner join last_obs_time 
                                  on test.unit = last_obs_time.unit 
                                 and test.obs_time = last_obs_time.obs_time 
                               where test.unit = t1.unit)) obs_value
from unit_times t1
left join test t2 
       on t1.unit = t2.unit and t1.obs_time = t2.obs_time
order by t1.unit, t1.obs_time;
unit obs_time  obs_value
--
1    1         3.8
1    2         3.1
1    3         2.0
1    4         2.0
1    5         2.0
1    6         2.0
2    1         4.1
. . . 
3    4         3.5
3    5         3.5
3    6         3.5

To get the same visual output as the linked article shows, use the crosstab() function in the tablefunc module. You could also do that manipulation with application code.

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
  • Why not `COALESCE (obs_value, last_value(obs_value) OVER(PARTITION BY unit ORDER BY obs_time) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)` ? – Ihor Romanchenko Dec 09 '14 at 20:54
  • Because that expression doesn't work. (The last_value() function doesn't do what you think it does in this case. You'll need a rewrite if you want to use a window function.) I wrote CREATE TABLE and INSERT statements; try it yourself. – Mike Sherrill 'Cat Recall' Dec 09 '14 at 22:58