Looking for something like a Lag function but more complex (SQL)

Question

Im racking my brain about how to do this and I dont think its possible but not sure what to even search for. So the below is a subset of the data (using example numbers)

Original Dataset

and I want to use like a lag function to pull the value down then multiply by % column and have something like the below (using the new calculated value each time)

Final Dataset

I have a feeling its not possible but fingers crossed. Using Redshift if it helps

Thanks in Advance

buddemat · Accepted Answer · 2020-09-27T22:03:29.697

Amazon Redshift is based on Postgresql, and even if there are some differences (cmp. https://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html), it does include the lag() function (see https://docs.aws.amazon.com/redshift/latest/dg/r_WF_LAG.html).

So your statement should probably look something like this (works in postgres)

UPDATE yourtable
   SET number = (q1.percentage * q1.prevnumber)
  FROM (SELECT Months, percentage, lag(number) OVER (ORDER BY Months) as prevnumber 
          FROM yourtable) q1
 WHERE yourtable.Months = q1.Months AND yourtable.Months > 0;

You can also include a condition to only update those fields that do not have a value yet by adding something like AND Number IS NULL to the last WHERE clause. And if you have multiple years, you need to take that into account in your ordering, of course.

Have a look at these two SO posts for more detailed discussions and examples:

Is there a way to access the "previous row" value in a SELECT statement?

PostgreSQL update query with "lag()" window function

EDIT: This query will not update all rows in a single pass, as the query is based on the original table entries and on updating the row of Month 10 will have a NULL value as one of the factors of the multiplication (number from Month 9).

There are different approaches you can use to reach your desired result. I have added two examples below. Both work in postgres and should also work in redshift:

(1) You can repeat the above UPDATE statement the appropriate number of times using a WHILE loop. The following function iteratively updates the table until no more rows exist where number is NULL:

CREATE OR REPLACE FUNCTION update_numbers() RETURNS void AS
$BODY$
BEGIN
    WHILE (SELECT COUNT(*) FROM yourtable WHERE number IS NULL) > 0 LOOP        
        UPDATE yourtable
           SET number = (q1.percentage * q1.prevnumber)
          FROM (SELECT Months, percentage, lag(number) OVER (ORDER BY Months) as prevnumber 
                  FROM yourtable) q1
         WHERE yourtable.number IS NULL
               AND yourtable.Months = q1.Months 
               AND yourtable.Months > 0;
    END LOOP;
END;   
$BODY$
LANGUAGE plpgsql VOLATILE;

(2) You can use a CURSOR to loop through the individual rows in which number is NULL and update each of them one at a time:

CREATE OR REPLACE FUNCTION update_numbers() RETURNS void AS 
$BODY$
DECLARE
    rec record;
    prev numeric;
    cur cursor for (SELECT * FROM yourtable WHERE number IS NULL ORDER BY Months) FOR UPDATE;
BEGIN
    FOR rec in cur LOOP
        SELECT prevnumber INTO prev FROM (SELECT Months, percentage, lag(number) OVER (ORDER BY Months) as prevnumber  
                    FROM yourtable) q1 where q1.months = rec.months;
        UPDATE yourtable
            SET number = (percentage * prev)
            WHERE CURRENT of cur;
    END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;

Calling either of these two with

SELECT update_numbers();

updates all rows of your example table.

Just tested the above (so in my example month 9) but when it tries to update row 10, month 9 current value is blank so it doesnt do anything? there doesnt seem to be a way of saying once that 9 is calculated use that for row 10 (unless i repeat this code x amount of times until they are all populated? — gmorrell, Sep 08 '20 at 23:02

Looking for something like a Lag function but more complex (SQL)

1 Answers1