-1

I'm fairly new to BigQuery and Standard SQL and am trying to calculate a cumulative sum, where the resulting column should not go below 0 and keep on calculating the cumulative sum from there. This question is similiar, but uses SQL Server.

Let's say I have a table like the following:

sample AS (
    SELECT 0 as id, -1 as x
    UNION ALL SELECT 1, 2
    UNION ALL SELECT 2, 1
    UNION ALL SELECT 3, 3
    UNION ALL SELECT 4, -7
    UNION ALL SELECT 5, 1
    UNION ALL SELECT 6, 1
    UNION ALL SELECT 7, -4
    UNION ALL SELECT 8, 1
)

Calculating the cumulative sum using this code snippet:

SELECT 
    id,
    x,
    SUM(x) OVER (
        ORDER BY id ASC
        ROWS UNBOUNDED PRECEDING
    ) AS cumulative_sum
FROM sample

results in:

id   |   x   |  cum_sum
-------------------------
 0   |   -1  |    -1
 1   |    2  |     1
 2   |    1  |     2
 3   |    3  |     5
 4   |   -7  |    -2
 5   |    1  |    -1
 6   |    1  |     0
 7   |   -4  |    -4
 8   |    1  |    -3

But what I am trying to achieve is a result like this:

id   |   x   |  cum_sum
-------------------------
 0   |   -1  |     0
 1   |    2  |     2
 2   |    1  |     3
 3   |    3  |     6
 4   |   -7  |     0
 5   |    1  |     1
 6   |    1  |     2
 7   |   -4  |     0
 8   |    1  |     1

Is this possible using Standard SQL in BigQuery? Happy about any suggestions!


EDIT: I have tried using a JavaScript UDF, like the following:

CREATE TEMP FUNCTION cumSumBounded(x ARRAY<NUMERIC>)
RETURNS BIGNUMERIC
LANGUAGE js AS r"""
    let p=0;
    let l=0;
    return x.map(c => p=(p+=c)<l?l:p).pop();
""";

And then use

SELECT
    id,
    x,
    cumSumBounded(ARRAY_AGG(x)) AS cumulative_sum
FROM sample
GROUP BY id, x
ORDER BY id ASC

But this doesn't quite do the trick. The JS function works as expected, but I'm having trouble figuring out how to best pass the data to it since it doesn't work with an OVER clause like the SUM() function.

In addition to this, how do I make sure that the data is passed in the correct order, e.g. by ID? Is it sufficient to use ORDER BY in the same part I'm calling the JS function?

Also, the data I'm going to pass to the function may vary from just a few dozen rows to a couple million rows. Is this going to work regardless of the number of rows or are there any ressource limitations? Especially when I have to order the data first.

  • Do you have or could you make a JS function that gives whatever answer you want, if you passed it an array of the history values? If so, I can point you in the right direction. – Mike Karp Jan 10 '22 at 04:18
  • Hello Mike, I have tried a JavaScript UDF and edited my question. Thanks for taking a look at my problem! – nothingnix Jan 11 '22 at 10:19
  • Bigquery arrays are ordered lists (https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays), and ARRAY_AGG() accepts OVER() details including ORDER BY (https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#array_agg) – Mike Karp Jan 11 '22 at 20:23
  • Aha! This is so helpful, can't believe I missed this. Thanks a lot! I'll put together a working example and put it here as a solution to my problem for further reference. – nothingnix Jan 11 '22 at 21:43

1 Answers1

0

After reading Mike's hints and this example of a bounded cumulative sum written in JavaScript, I have come up with the following solution:

CREATE TEMP FUNCTION CUMSUMBOUNDED(x ARRAY<NUMERIC>)
RETURNS NUMERIC
LANGUAGE js AS r"""
let p=0;
const l=0;
return x.map(c => p=(p+=c)<l?l:p).pop();
""";

WITH sample AS (
    SELECT 0 as id, -1 as x
    UNION ALL SELECT 1, 2
    UNION ALL SELECT 2, 1
    UNION ALL SELECT 3, 3
    UNION ALL SELECT 4, -7
    UNION ALL SELECT 5, 1
    UNION ALL SELECT 6, 1
    UNION ALL SELECT 7, -4
    UNION ALL SELECT 8, 1
)

SELECT 
    id,
    x,
    CUMSUMBOUNDED(
        ARRAY_AGG(CAST(x AS NUMERIC)) OVER (
            ORDER BY id ASC
            ROWS UNBOUNDED PRECEDING
        )
    ) AS cumulative_sum
FROM sample

This gives the desired result, but is quite slow on larger datasets. Also keep in mind that the sum might overflow.

Happy about suggestions to increase speed if anyone has a better idea!