I'm not sure whether this is possible with some of the new BigQuery scripting capabilities, UDFs, array/string functions (or anything else!), however I simply can't figure it out.
I'm trying to write the SQL for a view in BigQuery which dynamically defines columns based on query results, similar to a pivot table in a spreadsheet/BI tool (or melt in pandas). I can do this externally in Python or hard-code it using case statements, but I'm sure that a SQL solution to this would be incredibly useful to a huge number of people.
Essentially I'm trying to write a query which would transform a table like this:
year | name | number
-----------------------
1963 | Michael | 9246
1961 | Michael | 9055
1958 | Michael | 9203
1957 | Michael | 9116
1953 | Robert | 9061
1952 | Robert | 9205
1951 | Robert | 9054
1948 | Robert | 9015
1947 | Robert | 10025
1947 | John | 9634
1946 | Robert | 9295
----------------------
SQL to generate initial example table:
SELECT year, name, number
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE number > 9000
ORDER BY year DESC
Into a table with the following structure:
year | John | Michael | Robert
---------------------------------
1946 | | 9,295 |
1947 | 9,634 | | 10,025
1948 | | 9,015 |
...
This then needs to be connected to downstream tools, without requiring maintenance when the data changes. I know that this is not always a great idea and that tidy form data is more universally useful, but there are still some scenarios where this behaviour is desirable.
I have seen a few solutions on here, but they all seem to involve string generation and then manually pasting the query... I can do this via the BigQuery API but am desperate to find a dynamic solution using nothing but SQL so I don't have to maintain an external function.
Thanks in advance for any pointers!