0

How can I use SQL to compute a cumulative sum over a column, so that the cumulative sum always stays within upper/lower bounds. Example with lower bound -2 and upper bound 10, showing the regular cumulative sum and the bounded cumulative sum.

id      input 
-------------
 1       5   
 2       7   
 3     -10   
 4     -10   
 5       5   
 6      10   

Result:

id    cum_sum    bounded_cum_sum  
---------------------------------
  1       5          5     
  2      12         10    
  3       2          0
  4      -8         -2
  5      -3          3     
  6       7         10

See https://codegolf.stackexchange.com/questions/61684/calculate-the-bounded-cumulative-sum-of-a-vector for some (non SQL) examples of a bounded cumulative sum.

Petter T
  • 3,387
  • 2
  • 19
  • 31
  • https://stackoverflow.com/questions/2120544/how-to-get-cumulative-sum/2120639 – GenericDisplayName Oct 02 '19 at 19:20
  • What you mean upper/lower bounds? Please explain the logic – Juan Carlos Oropeza Oct 02 '19 at 19:20
  • See the example given. The cum_sum column is the normal cumulative sum (or running total if you want). The bounded_cum_sum is what I am after. E.g. in the second row the cumulative sum equals 12, but since this value is above the upper bound, the bounded_cum_sum gets a value equal to its upper bound, that is 10. – Petter T Oct 02 '19 at 19:27
  • This is not the same as https://stackoverflow.com/questions/2120544/how-to-get-cumulative-sum, as that question considers the "regular" cumulative sum. – Petter T Oct 02 '19 at 19:32
  • @JuanCarlosOropeza: I have added a link to another Stack Exchange question with more explanation on a bounded cumulative sum and upper/lower bounds (not related to SQL) – Petter T Oct 02 '19 at 19:57
  • What have you tried so far??? – Eric Oct 02 '19 at 20:06
  • Also you need to provide the order in which we need to do cumulative sum, without any ordering column sql server might take any order not the same every results – Kannan Kandasamy Oct 02 '19 at 20:12
  • @Eric I am sorry, but SQL is not my field of expertise, so it is too complex for me to create such a query myself. – Petter T Oct 02 '19 at 20:33
  • @KannanKandasamy: Have updated with an Id that can be the ordering column. – Petter T Oct 02 '19 at 20:38
  • @PetterT SO is not a free coding service website. You should attempt to at least create a query. If you're stuck, we can help. But if you just ask for solution, I doubt no one will help. – Eric Oct 02 '19 at 22:07
  • I helped because I don't expect a non-analyst or non-SQL-expert could have tried in any sensible direction for this kind of "conditional cumulative" tasks. I can recall vividly the day the helpless junior data analyst banging his head in his office because he hadn't heard of "cursor" but kept messing around with multiple "with"s and advanced aggregate functions such as row_number(). ROTFLOL – Bill Huang Oct 02 '19 at 22:37
  • @Eric: I do agree. But if I should solve this kind of problem myself it would probably be to create an SQL query importing all the rows into C# (where the results are going anyway) and solve the problem there. (C# is one of my fields of expertise) Writing up such a solution would probably not bring any value to the question though. – Petter T Oct 03 '19 at 07:24
  • @cbhuang: Thanks for your solution. Your comment hits the nail on the head, this is far beyond anything I could write myself :-) – Petter T Oct 03 '19 at 07:27

1 Answers1

2

You can (almost) always use a cursor to implement whatever cumulative logic you have. The technique is quite routine so can be used to tackle a variety of problems easily once you get it.

One specific thing to note: Here I update the table in-place, so the [id] column must be uniquely indexed.

(Tested on SQL Server 2017 latest linux docker image)

Test Dataset

use [testdb];
if OBJECT_ID('testdb..test') is not null
    drop table testdb..test;

create table test (
    [id] int,
    [input] int,
);

insert into test (id, input)
values (1,5), (2,7), (3,-10), (4,-10), (5,5), (6,10);

Solution

/* A generic row-by-row cursor solution */

-- First of all, make [id] uniquely indexed to enable "where current of" 
create unique index idx_id on test(id);

-- append answer columns
alter table test 
    add [cum_sum] int,
        [bounded_cum_sum] int;

-- storage for each row
declare @id int,
        @input int,
        @cum_sum int, 
        @bounded_cum_sum int;
-- record accumulated values
declare @prev_cum_sum int = 0,
        @prev_bounded_cum_sum int = 0;

-- open a cursor ordered by [id] and updatable for assigned columns
declare cur CURSOR local
for select [id], [input], [cum_sum], [bounded_cum_sum]
    from test
    order by id
for update of [cum_sum], [bounded_cum_sum];
open cur;

while 1=1 BEGIN

    /* fetch next row and check termination condition */
    fetch next from cur 
        into @id, @input, @cum_sum, @bounded_cum_sum;

    if @@FETCH_STATUS <> 0
        break;

    /* program body */

    -- main logic
    set @cum_sum = @prev_cum_sum + @input;
    set @bounded_cum_sum = @prev_bounded_cum_sum + @input;
    if @bounded_cum_sum > 10 set @bounded_cum_sum=10
    else if @bounded_cum_sum < -2 set @bounded_cum_sum=-2;

    -- write the result back
    update test 
        set [cum_sum] = @cum_sum,
            [bounded_cum_sum] = @bounded_cum_sum
        where current of cur;

    -- setup for next row
    set @prev_cum_sum = @cum_sum;
    set @prev_bounded_cum_sum = @bounded_cum_sum;
END

-- cleanup
close cur;
deallocate cur;

-- show
select * from test;

Result

|   | id | input | cum_sum | bounded_cum_sum |
|---|----|-------|---------|-----------------|
| 1 | 1  | 5     | 5       | 5               |
| 2 | 2  | 7     | 12      | 10              |
| 3 | 3  | -10   | 2       | 0               |
| 4 | 4  | -10   | -8      | -2              |
| 5 | 5  | 5     | -3      | 3               |
| 6 | 6  | 10    | 7       | 10              |
Bill Huang
  • 4,491
  • 2
  • 13
  • 31
  • Yeah, most people would rather do this with a general purpose language on her/his own workstation. However, if your company were too generous to provide access to pandas DataFrame and you were too timid to switch a job, then life will likely find its own way out -- on the SQL Server..... – Bill Huang Oct 03 '19 at 00:19
  • You'd be surprised at how annoying this is to do at scale in a pandas DataFrame ;-) – Chris Withers May 08 '21 at 10:39