0

I have some logs with dates and an aggregated minutes counter time.

Each hours (a row) had been filled out with a trick like this one : Duplicate groups of records to fill multiple date gaps in Google BigQuery

The issue :

I want to complete time column with minutes still available (understand here, 60 minutes maximum per hour)

Here's the desired output : remainingTime is resulting from previous rows.. Let's say remainingTime = 70 minutes

#standardSQL
WITH history AS (
  SELECT '2017-01-01' AS date, 'a' AS product, 0 AS minutes UNION ALL
  SELECT '2017-01-02' AS date, 'a' AS product, 100 AS minutes UNION ALL
  SELECT '2017-01-03' AS date, 'a' AS product, 0 AS minutes UNION ALL
  SELECT '2017-01-04' AS date, 'a' AS product, 0 AS minutes UNION ALL
  SELECT '2017-01-05' AS date, 'a' AS product, 30 AS minutes UNION ALL
  SELECT '2017-01-06' AS date, 'a' AS product, 0 AS minutes UNION ALL
  SELECT '2017-01-01' AS date, 'b' AS product, 100 AS minutes UNION ALL
  SELECT '2017-01-02' AS date, 'b' AS product, 0 AS minutes UNION ALL
  SELECT '2017-01-03' AS date, 'b' AS product, 0 AS minutes UNION ALL
  SELECT '2017-01-04' AS date, 'b' AS product, 0 AS minutes UNION ALL
  SELECT '2017-01-05' AS date, 'b' AS product, 0 AS minutes 

+---------+------------+---------+---------------+---------------------+
| product |    date    | minutes | remainingTime |       time          |
+---------+------------+---------+---------------------------------+
|    a    | 2017-01-01 |   0     |  10           | 60 (max 60 reached) | // 0 + 70 - 60 = 10
|    a    | 2017-01-02 |   100   |  50           | 60 (same)           | // 100 + 10 - 60 = 50
|    a    | 2017-01-03 |   0     |  0            | 50 (only 50/60)     | // 0 + 50 - 50 = 0            
|    a    | 2017-01-04 |   0     |  0            | 0 (and so on)       | // 0 + 0 - 0 = 0
|    a    | 2017-01-05 |   30    |  0            | 30                  | // 30 + 0 - 30 = 0
|    a    | 2017-01-06 |   0     |  0            | 0                   | // 0 + 0 - 0 = 0
 

... and so on for other products
+---------------+--------+------+

I'm almost done with a complex and ugly query but I'm currently stuck with a temporary calculation column..

(PS : I didn't practice SQL for many years, so I'm learning back again fundamentals and discovering BigQuery Standard SQL in the same time)

Thanks !

nicolas.grd
  • 328
  • 1
  • 2
  • 8

1 Answers1

0

BigQuery does not support recursive operations natively. Try array_agg() combined with JavaScript user-defined function, but this approach is not very scalable:

CREATE TEMP FUNCTION special_sum(x ARRAY<INT64>)
RETURNS INT64
LANGUAGE js
AS """
  var remaining = 70;
  var time = 0;
  for (const num of x)
  {
     time = Math.min(parseInt(num) + remaining, 60);
     remaining = parseInt(num) + remaining - time;
  }
  return time;
""";

WITH history AS (
  SELECT '2017-01-01' AS date, 'a' AS product, 0 AS minutes UNION ALL
  SELECT '2017-01-02' AS date, 'a' AS product, 100 AS minutes UNION ALL
  SELECT '2017-01-03' AS date, 'a' AS product, 0 AS minutes UNION ALL
  SELECT '2017-01-04' AS date, 'a' AS product, 0 AS minutes UNION ALL
  SELECT '2017-01-05' AS date, 'a' AS product, 30 AS minutes UNION ALL
  SELECT '2017-01-06' AS date, 'a' AS product, 0 AS minutes UNION ALL
  SELECT '2017-01-01' AS date, 'b' AS product, 100 AS minutes UNION ALL
  SELECT '2017-01-02' AS date, 'b' AS product, 0 AS minutes UNION ALL
  SELECT '2017-01-03' AS date, 'b' AS product, 0 AS minutes UNION ALL
  SELECT '2017-01-04' AS date, 'b' AS product, 0 AS minutes UNION ALL
  SELECT '2017-01-05' AS date, 'b' AS product, 0 AS minutes 
)
select *, 
 special_sum(array_agg(minutes) over (partition by product order by date rows unbounded preceding)) as time 
from history

enter image description here

Sergey Geron
  • 9,098
  • 2
  • 22
  • 29
  • TY Sergey ! I upvote your solution because it's working pretty well ! It looks very interesting.. I keep on looking for a "smartest" way to achieve this – nicolas.grd Jan 20 '21 at 09:41