Cumulative sum based on same column calculated result

Question

I have the following table, for which I am trying to calculate a running balance, and remaining value, but the remaining value is the function of the previously calculated row, as such:

date         PR    amount  total    balance  remaining_value
----------------------------------------------------------
'2020-1-1'   1     1.0     100.0    1.0      100    -- 100 (inital total)
'2020-1-2'   1     2.0     220.0    3.0      320   -- 100 (previous row) + 220 
'2020-1-3'   1    -1.5    -172.5    1.5      160   -- 320 - 160 (see explanation 1)
'2020-1-4'   1     3.0     270.0    4.5      430   -- 160 + 270
'2020-1-5'   1     1.0      85.0    5.5      515   -- 430 + 85 
'2020-1-6'   1     2.0     202.0    7.5      717   -- 575 + 202 
'2020-1-7'   1    -4.0    -463.0    3.5      334.6 -- 717 - 382.4 (see explanation 2)
'2020-1-8'   1    -0.5     -55.0    3.0      ...
'2020-1-9'   1     2.0     214.0    5.0
'2020-1-1'   2     1.0     100      1.0      100   -- different PR: start new running total

The logic is as follows:

For positive amount rows, the remaining value is simply the value from the previous row in column remaining_value + the value in column total from that row.
For negative amount rows, it gets tickier:

Explanation 1: We start with 320 (previous row balance) and from it we remove 1.5/3.0 (absolute value of current row amount divided by previous row balance) and we multiply it by the previous row remaining_value, which is 320. The calculation gives:

320 - (1.5/3 * 320) = 160

Explanation 2: Same logic as above. 717 - (4/7.5 * 717) = 717 - 382.4

4/7.5 here represents the current row's absolute amount divided by the previous row's balance.

I tried the window function sum() but did not manage to get the desired result. Is there a way to get this done in PostgreSQL without having to resort to a loop?

Extra complexity: There are multiple products identified by PR (product id), 1, 2 etc. Each need their own running total and calculation.

There is a date field I use to order them, but for the purpose of this example I just showed the ordered table. You got it, when there are positive totals we simply add them, when there are negative totals we use the formula on the previous total (thus ignoring the negatives). I would add to this that the PR field represents product ID, and there might be multiple product IDs. Usually, I would separate them with a window function and a partition, but in this case not sure how to do.. — AlexM88, Jul 06 '20 at 08:42

Erwin Brandstetter · Accepted Answer · 2020-07-06T11:37:57.080

You could create a custom aggregate function:

CREATE OR REPLACE FUNCTION f_special_running_sum (_state numeric, _total numeric, _amount numeric, _prev_balance numeric)
  RETURNS numeric
  LANGUAGE sql IMMUTABLE AS
'SELECT CASE WHEN _amount > 0 THEN _state + _total
             ELSE _state * (1 + _amount / _prev_balance) END';

CREATE OR REPLACE AGGREGATE special_running_sum (_total numeric, _amount numeric, _prev_balance numeric) (
  sfunc    = f_special_running_sum 
, stype    = numeric
, initcond = '0'
);

The CASE expression does the split: If amount is positive, just add total, else apply your (simplified) formula:
320 * (1 + -1.5 / 3.0) instead of 320 - (1.5/3 * 320), i.e.:

_state * (1 + _amount / _prev_balance)

Function and aggregate parameter names are only for documentation.

Then your query can look like this:

SELECT *
     , special_running_sum(total, amount, prev_balance) OVER (PARTITION BY pr ORDER BY date)
FROM  (
   SELECT pr, date, amount, total
        , lag(balance, 1, '1') OVER (PARTITION BY pr ORDER BY date) AS prev_balance
   FROM   tbl
   ) t;

db<>fiddle here

We need a subquery to apply the first window function lag() and fetch the previous balance into the current row (prev_balance). I default to 1 if there is no previous row to avoid NULL values.

Caveats:

If the first row has a negative total, the result is undefined. My aggregate function defaults to 0.
You did not declare data types, nor requirements regarding precision. I assume numeric and aim for maximum precision. The calculation with numeric is precise. But your formula produces fractional decimal numbers. Without rounding, there will be a lot of fractional digits after a couple of divisions, and the calculation will quickly degrade in performance. You'll have to strike a compromise between precision and performance. For example, doing the same with double precision has constant performance.

Cumulative adding with dynamic base in Postgres

Thank you so much! Did some testing and it works in all cases. indeed I replaced by double precision, which is good for my purpose. — AlexM88, Jul 06 '20 at 21:47

Cumulative sum based on same column calculated result

1 Answers1