2

I want to find a way to get the last non-null value for a purchase of a user. For example, I have this:

date        | user_id | purchase_amount
2020-01-01  | 1       | 39
2020-01-04  | 1       | null
2020-01-10  | 1       | 90
2020-01-15  | 1       | null

I want if to fill in the following way:

date        | user_id | purchase_amount
2020-01-01  | 1       | 39
2020-01-04  | 1       | 39
2020-01-10  | 1       | 90
2020-01-15  | 1       | 90

Right now, I am using a LEAD function partition by user_id but it still would consider even non-null values. How to approach this?

titutubs
  • 355
  • 1
  • 9

1 Answers1

3

As a disclaimer, this answer is largely based on the excellent accepted answer to this SO question. This answer creates a pseudo-group for each block of records which should ultimately be assigned the same purchase amount. It then uses the FIRST_VALUE() function, which should be available on Redshift, to fill in the null gaps.

WITH cte AS (
    SELECT *, SUM(CASE WHEN purchase_amount IS NULL THEN 0 ELSE 1 END)
                  OVER (PARTITION BY user_id ORDER BY date) AS grp
    FROM yourTable
)

SELECT date, user_id,
       FIRST_VALUE(purchase_amount) OVER (PARTITION BY grp, user_id
                                          ORDER BY date) AS purchase_amount
FROM cte
ORDER BY user_id, date;

screen capture from demo link below

Demo

Note that the demo is for Postgres, but the code should also run on Redshift with no modification necessary.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360