2

Say I have a BQ table containing the following information

id test.name test.score
1 a 5
b 7
2 a 8
c 3

Where test is nested. How would I pivot test into the following table?

id a b c
1 5 7
2 8 3

I cannot pivot test directly, as I get the following error message at pivot(test): Table-valued function not found. Previous questions (1, 2) don't deal with nested columns or are outdated.

The following query looks like a useful first step:

select a.id, t
from `table` as a,
unnest(test) as t

However, this just provides me with:

id test.name test.score
1 a 5
1 b 7
2 a 8
2 c 3
Sergey Geron
  • 9,098
  • 2
  • 22
  • 29
ThomasW
  • 344
  • 3
  • 10

3 Answers3

5

Conditional aggregation is a good approach. If your tables are large, you might find that this has the best performance:

select t.id,
       (select max(tt.score) from unnest(t.score) tt where tt.name = 'a') as a,
       (select max(tt.score) from unnest(t.score) tt where tt.name = 'b') as b,
       (select max(tt.score) from unnest(t.score) tt where tt.name = 'c') as c
from `table` t;

The reason I recommend this is because it avoids the outer aggregation. The unnest() happens without shuffling the data around -- and I have found that this is a big win in terms of performance.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

One option could be using conditional aggregation

select id, 
       max(case when test.name='a' then test.score end) as a,
       max(case when test.name='b' then test.score end) as b,
       max(case when test.name='c' then test.score end) as c
from 
(
select a.id, t
from `table` as a,
unnest(test) as t
)A group by id
Fahmi
  • 37,315
  • 5
  • 22
  • 31
  • Thank you! I think this is already a great step, but the number of rows in each test could be different for each id and will at least change over time. Could this be done dynamically? – ThomasW Sep 21 '20 at 09:08
  • @ThomasW, you can check here - https://stackoverflow.com/questions/26272514/how-to-pivot-table-in-bigquery for the dynamic build – Fahmi Sep 21 '20 at 09:10
0

Below is generic/dynamic way to handle your case

EXECUTE IMMEDIATE (
  SELECT """
  SELECT id, """ || 
    STRING_AGG("""MAX(IF(name = '""" || name || """', score, NULL)) AS """ || name, ', ') 
  || """
  FROM `project.dataset.table` t, t.test
  GROUP BY id
  """
  FROM (
    SELECT DISTINCT name
    FROM `project.dataset.table` t, t.test
    ORDER BY name
  )
);  

If to apply to sample data from your question - output is

Row     id      a       b       c    
1       1       5       7       null     
2       2       8       null    3    
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230