0

I'm trying to query a table that uses a basic repeating field to store data like this:

+---+----------+------------+
| i | data.key | data.value |
+---+----------+------------+
| 0 | a        |          1 |
|   | b        |          2 |
| 1 | a        |          3 |
|   | b        |          4 |
| 2 | a        |          5 |
|   | b        |          6 |
| 3 | a        |          7 |
|   | b        |          8 |
+---+----------+------------+

I'm trying to figure out how to run a query that gets a result like

+---+----+----+
| i | a  | b  |
+---+----+----+
| 1 |  4 |  6 |
| 3 | 12 | 14 |
+---+----+----+

where each row represents a non-overlapping sum (i.e. i=1 is the sum of rows i=0 and i=1) and the data has been pivoted such that data.key is now a column.

Problem 1:

I did my best to convert this answer to use Standard SQL and ended up with:

SELECT
    i,
    (SELECT SUM(value) FROM UNNEST(data) WHERE key = 'a') as `a`,
    (SELECT SUM(value) FROM UNNEST(data) WHERE key = 'b') as `b`
  FROM
    `dataset.testing.dummy`)

This works, but I'm wondering if there is a better way to do this, especially since it produces a particularly verbose query when trying to use analytic functions:

SELECT
  i,
  SUM(a) OVER (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS `a`,
  SUM(b) OVER (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS `b`
FROM (
  SELECT
    i,
    (SELECT SUM(value) FROM UNNEST(data) WHERE key = 'a') as `a`,
    (SELECT SUM(value) FROM UNNEST(data) WHERE key = 'b') as `b`
  FROM
    `dataset.testing.dummy`)
ORDER BY
  i;

Problem 2:

How do I write a ROW or RANGE statement such that resulting windows don't overlap. In the last query, I get a rolling sum over the data, which isn't quite what I'm looking to do.

+---+----+----+
| i | a  | b  |
+---+----+----+
| 0 |  1 |  2 |
| 1 |  4 |  6 |
| 2 |  8 | 10 |
| 3 | 12 | 14 |
+---+----+----+

The rolling sum produces a result for each row, whereas I'm attempting to reduce the number of rows returned.

skycoop
  • 311
  • 1
  • 3
  • 7

1 Answers1

1

Using a temporary SQL function plus a named window helps with the verbosity. I had to use another subselect to apply the filter on i afterward, though. Here's a self-contained example:

#standardSQL
CREATE TEMP FUNCTION SumKey(
    data ARRAY<STRUCT<key STRING, value INT64>>,
    target_key STRING) AS (
  (SELECT SUM(value) FROM UNNEST(data) WHERE key = target_key) 
);

WITH Input AS (
  SELECT
    0 AS i,
    ARRAY<STRUCT<key STRING, value INT64>>[('a', 1), ('b', 2)] AS data UNION ALL
  SELECT 1, ARRAY<STRUCT<key STRING, value INT64>>[('a', 3), ('b', 4)] UNION ALL
  SELECT 2, ARRAY<STRUCT<key STRING, value INT64>>[('a', 5), ('b', 6)] UNION ALL
  SELECT 3, ARRAY<STRUCT<key STRING, value INT64>>[('a', 7), ('b', 8)]
)
SELECT * FROM (
  SELECT
    i,
    SUM(a) OVER W AS a,
    SUM(b) OVER W AS b
  FROM (
    SELECT
      i,
      SumKey(data, 'a') AS a,
      SumKey(data, 'b') AS b
    FROM Input
  )
  WINDOW W AS (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
)
WHERE MOD(i, 2) = 1
ORDER BY i;

This results in:

+---+----+----+
| i | a  | b  |
+---+----+----+
| 1 |  4 |  6 |
| 3 | 12 | 14 |
+---+----+----+
Elliott Brossard
  • 32,095
  • 2
  • 67
  • 99