3

My Schema looks something like this:

userid:string
timestamp:integer
params:nested/repeated field with 2 fields
  - name:string (possible values: "a", "b","c")
  - value:string

I want my query to return the following:

userid, timestamp, a, b, c
123, 1447799796, foo, bar, xyz
233, 1447799900, bob, xxx, yyy
:
:

What's the easiest way to do this?

David M Smith
  • 2,212
  • 4
  • 21
  • 27

2 Answers2

3

when possible values are known in advance and there are not that many of them to write manually SQL - you can use below:

SELECT 
  userid,
  ts,
  MAX(IF(params.name = "a", params.value, NULL)) WITHIN RECORD a,
  MAX(IF(params.name = "b", params.value, NULL)) WITHIN RECORD b,
  MAX(IF(params.name = "c", params.value, NULL)) WITHIN RECORD c
FROM yourTable 

If possible values are "unknown" in advance and/or dynamic from run to run, you can use below helper SQL to generate above type of SQL.

SELECT 'select userid, ts, ' + 
   GROUP_CONCAT_UNQUOTED(
      'max(if(params.name = "' + STRING(params.name) + '", params.value, null)) 
       WITHIN RECORD as [' + STRING(params.name) + ']'
   ) 
   + ' from yourTable '
FROM (SELECT params.name FROM yourTable GROUP BY params.name)
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • Thanks Mike, what about using the SQL standard version of BigQuery ? – Florent Aug 11 '17 at 10:41
  • @FlorentCailly - Good question! A lot has changed since Nov 2015. Standard SQL of BigQuery is highly recommended nowadays! I migrated here long ago :) with exception of when I need to use snapshots (which are still available in Legacy SQL Only) – Mikhail Berlyant Aug 11 '17 at 15:53
  • Thanks Mike, could you post an update on how to do this using standard SQL ? For some reason, when I use the methodology above and then aggregate, I am not getting the correct result. If I use `FIRST` instead of `MAX`, I am getting a result much closer but still not the correct result. Here is the query: – Florent Aug 11 '17 at 18:54
  • if you can post new question with your specifics - i will be more than happy to answer :o) – Mikhail Berlyant Aug 11 '17 at 18:55
  • Thanks Mike, I just did: https://stackoverflow.com/questions/45642206/big-query-pivot-and-aggregate-repeated-fields – Florent Aug 11 '17 at 19:05
1

Something along these lines:

SELECT 
  userid,
  timestamp,
  FIRST(name == "a", value, NULL) WITHIN RECORD a,
  FIRST(name == "b", value, NULL) WITHIN RECORD b,
  FIRST(name == "c", value, NULL) WITHIN RECORD c,
FROM t
Mosha Pasumansky
  • 13,206
  • 5
  • 32
  • 55