0

My analytics involves the need to aggregate rows and to store the number of different values occurrences of a field someField in all the rows.

Sample data structure [someField, someKey]

I'm trying to GROUP BY someKey and then be able to know for each of the results how many time there was each someField values

Example:

[someField: a, someKey: 1],
[someField: a, someKey: 1],
[someField: b, someKey: 1],
[someField: c, someKey: 2],
[someField: d, someKey: 2]

What I would like to achieve:

[someKey: 1, fields: {a: 2, b: 1}],
[someKey: 2, fields: {c: 1, d: 1}],
Quentin Sommer
  • 195
  • 2
  • 14

3 Answers3

2

Does it work for you?

WITH data AS (
  select 'a' someField, 1 someKey UNION all
  select 'a', 1 UNION ALL
  select 'b', 1 UNION ALL
  select 'c', 2 UNION ALL
  select 'd', 2)

SELECT
  someKey,
  ARRAY_AGG(STRUCT(someField, freq)) fields
FROM(
  SELECT
    someField,
    someKey,
    COUNT(someField) freq
  FROM data
  GROUP BY 1, 2
)
GROUP BY 1

Results:

enter image description here

It won't give exactly the results you are looking for, but it might work to receive the same queries your previous result would. As you said, for each key you can retrieve how many times (column freq) someField happened.

I've been looking for a way on how to aggregate structs and couldn't find one. But retrieving the results as an ARRAY of STRUCTS turned out to be quite straightforward.

Willian Fuks
  • 11,259
  • 10
  • 50
  • 74
1

There's probably a smarter way to do this (and get it in the format you want e.g. using an Array for the 2nd column), but this might be enough for you:

with sample as (
select 'a' as someField, 1 as someKey UNION all
select 'a' as someField, 1 as someKey UNION ALL
select 'b' as someField, 1 as someKey UNION ALL
select 'c' as someField, 2 as someKey UNION ALL
select 'd' as someField, 2 as someKey)

SELECT
  someKey,
  SUM(IF(someField = 'a', 1, 0)) AS a,
  SUM(IF(someField = 'b', 1, 0)) AS b,
  SUM(IF(someField = 'c', 1, 0)) AS c,
  SUM(IF(someField = 'd', 1, 0)) AS d
FROM
  sample
GROUP BY
  someKey order by somekey asc

Results:

someKey a   b   c   d
---------------------    
  1     2   1   0   0    
  2     0   0   1   1

This is well used technique in BigQuery (see here).

Graham Polley
  • 14,393
  • 4
  • 44
  • 80
  • The a, b, c, and d keys are potentially infinite and unknown at the start, I can juste assume the field name – Quentin Sommer Jun 21 '17 at 12:13
  • It's still possible with unknown keys. You run a query first to generate the SQL. See here -> https://stackoverflow.com/questions/34010002/how-to-create-dummy-variable-columns-for-thousands-of-categories-in-google-bigqu/34010803#34010803 – Graham Polley Jun 21 '17 at 22:26
0

I'm trying to GROUP BY someKey and then be able to know for each of the results how many time there was each someField values

#standardSQL
SELECT
  someKey,
  someField,
  COUNT(someField) freq
FROM yourTable
GROUP BY 1, 2
-- ORDER BY someKey, someField  

What I would like to achieve:
[someKey: 1, fields: {a: 2, b: 1}],
[someKey: 2, fields: {c: 1, d: 1}],

This is different from what you expressed in words - it is called pivoting and based on your comment - The a, b, c, and d keys are potentially infinite - most likely is not what you need. At the same time - pivoting is easily doable too (if you have some finite number of field values) and you can find plenty of related posts

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230