0

I've the following dataset.

date orig dest value
11-4 S8 B9 -42
10-30 S8 B9 -2999
10-15 S8 B9 959
10-14 S8 B9 480
10-13 S8 B9 0
10-9 S8 B9 503

How do I convert it into something like this?

date orig dest value new value
11-4 S8 B9 -42 959
10-30 S8 B9 -2999 959
10-15 S8 B9 959 959
10-14 S8 B9 480 480
10-13 S8 B9 0 503
10-9 S8 B9 503 503

This is what I've using and is incorrect:

CASE WHEN value <= 0
     THEN last_value(value) over (PARTITION BY orig, dest ORDER BY date DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
     ELSE value 
END as new_value

The goal is here to not to replace the <=0 values with the maximum recent value but rather with the latest non-negative recent value.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470

1 Answers1

2

This should do it in Redshift:

SELECT date, orig, dest, value
     , max(value) OVER (PARTITION BY orig, dest, grp) AS new_value
FROM  (
   SELECT date, orig, dest, value
        , count(value > 0 OR NULL) OVER (PARTITION BY orig, dest ORDER BY date) AS grp
   FROM   tbl
   ) sub

Explanation:

About count(expression OR NULL), see:

Unfortunately, Redshift did not keep up with improvements to Postgres and lacks the aggregate FILTER clause (since Postgres 9.4!), which would be faster and more elegant:

SELECT date, orig, dest, value
     , max(value) OVER (PARTITION BY orig, dest, grp) AS new_value
FROM  (
   SELECT date, orig, dest, value
        , count(*) FILTER (WHERE value > 0) OVER (PARTITION BY orig, dest ORDER BY date) AS grp
   FROM   tbl
   ) sub

See:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228