Bigquery unnest and pivot column

Question

I have a very big table in BigQuery formatted as follows:

row	column.key	column.value.string_value	column.value.int_value	column.value.float_value	column.value.boolean_value	id
1	key1	aa	null	null	null	id1
	key2	null	null	null	false
	key3	fa	null	null	null
	key4	null	null	null	true
2	key1	ab	null	null	null	id1
	key2	null	null	null	false
	key3	gf	null	null	null
	key4	null	null	null	false
3	key1	af	null	null	null	id2
	key2	null	null	null	true
	key3	fa	null	null	null
	key4	null	null	null	false

I need to re-format it as follows:

row	key1	key2	key3	key4	id
1	aa	false	fa	true	id1
2	ab	false	gf	false	id1
3	af	true	fa	false	id2

I tried to use this but couldn't make it work in my table

How to unnest and pivot two columns in BigQuery

One important thing in my table is that, like in the above table, the ids repeat in different rows but have different values for the same keys. Where the ids repeat I want to still have all incidences of the ids, like in the formatted table example.

Also I put only 4 keys here but I actually have 50, so the least manual it is, the better. If the only way is writing each column manually then I'll still do it though.

Does anyone know how to do it? I basically need to turn every key into a new column with it's corresponding value associated with the ids, but I don't know how to do it.

Thank you very much!

Jaytiger · Accepted Answer · 2023-03-10T13:39:51.873

In order to re-format it, firstly you need to merge column.value struct into one single column value. This can be easily done using COALESCE function since usually there is only one non-null value in column.value struct in Google Analytics.

COALESCE(
  value.string_value, 
  '' || value.int_value, -- to covert int64 to string
  '' || value.float_value,
  '' || value.boolean_value
) value,

After that, you can simply reshape it using PIVOT query.

SELECT * FROM (
  SELECT t.* EXCEPT(column),
         key,
         COALESCE(
           value.string_value, 
           '' || value.int_value,
           '' || value.float_value,
           '' || value.boolean_value
         ) value,
    FROM sample_table t, UNNEST(column)
) PIVOT (ANY_VALUE(value) FOR key IN ('key1', 'key2', 'key3', 'key4'));

Query results

sample_table

WITH sample_table AS (
SELECT 1 row,
       'id1' id,
       [STRUCT('key1' AS key, STRUCT('aa' AS string_value, INT64(null) AS int_value, FLOAT64(null) AS float_value, BOOL(null) AS boolean_value) AS value),
        STRUCT('key2' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, false AS boolean_value) AS value),
        STRUCT('key3' AS key, STRUCT('fa' AS string_value, null AS int_value, null AS float_value, null AS boolean_value) AS value),
        STRUCT('key4' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, true AS boolean_value) AS value)] column
 UNION ALL
SELECT 2 row, 'id1' id,
       [STRUCT('key1' AS key, STRUCT('ab' AS string_value, INT64(null) AS int_value, FLOAT64(null) AS float_value, BOOL(null) AS boolean_value) AS value),
        STRUCT('key2' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, false AS boolean_value) AS value),
        STRUCT('key3' AS key, STRUCT('gf' AS string_value, null AS int_value, null AS float_value, null AS boolean_value) AS value),
        STRUCT('key4' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, false AS boolean_value) AS value)] column
 UNION ALL
SELECT 3 row, 'id2' id,
       [STRUCT('key1' AS key, STRUCT('af' AS string_value, INT64(null) AS int_value, FLOAT64(null) AS float_value, BOOL(null) AS boolean_value) AS value),
        STRUCT('key2' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, true AS boolean_value) AS value),
        STRUCT('key3' AS key, STRUCT('fa' AS string_value, null AS int_value, null AS float_value, null AS boolean_value) AS value),
        STRUCT('key4' AS key, STRUCT(null AS string_value, null AS int_value, null AS float_value, false AS boolean_value) AS value)] column
)

You can further generalize it with a dynamic query if you have many keys in your table refering to the link in your question or below.

How to Pivot table in BigQuery

Thank you very much for your answer! I works very nicely and It's going to help me out a lot. But I think I forgot to mention another important thing about my table. Sometimes the ids repeat in different rows, and when the ids repeat they have different values for the same keys. For those cases I want to keep all incidences of one id but in different rows. This code I'm not sure if it only uses the first insidence or if it sort of blends them, but it only leaves one of them. Is that possible to do? I'll also edit my original question so that it's clear what I mean. Thank you! — AldanaBRZ, Mar 10 '23 at 12:25
@AldanaBRZ, Above query returns the result you expect without modification, I think. It's because `(row, id)` combination is unique for each row regardless of duplicated `id`s. If your real dataset doesn't have a `row` column, you can add it using `ROW_NUMBER()` or other ways. I'ved updated my answer with `sample_table`, so try it and let me know again if I'm missing somthing from your explanation. — Jaytiger, Mar 10 '23 at 13:32

Bigquery unnest and pivot column

1 Answers1

Linked