0

I'm not sure whether this is possible with some of the new BigQuery scripting capabilities, UDFs, array/string functions (or anything else!), however I simply can't figure it out.

I'm trying to write the SQL for a view in BigQuery which dynamically defines columns based on query results, similar to a pivot table in a spreadsheet/BI tool (or melt in pandas). I can do this externally in Python or hard-code it using case statements, but I'm sure that a SQL solution to this would be incredibly useful to a huge number of people.

Essentially I'm trying to write a query which would transform a table like this:

year | name    | number
-----------------------
1963 | Michael | 9246
1961 | Michael | 9055
1958 | Michael | 9203
1957 | Michael | 9116
1953 | Robert  | 9061
1952 | Robert  | 9205
1951 | Robert  | 9054
1948 | Robert  | 9015
1947 | Robert  | 10025
1947 | John    | 9634
1946 | Robert  | 9295
----------------------

SQL to generate initial example table:
SELECT   year, name, number
FROM     `bigquery-public-data.usa_names.usa_1910_2013` 
WHERE    number > 9000
ORDER BY year DESC

Into a table with the following structure:

year | John  | Michael | Robert
---------------------------------
1946 |       | 9,295   |       
1947 | 9,634 |         |  10,025
1948 |       | 9,015   |
...

This then needs to be connected to downstream tools, without requiring maintenance when the data changes. I know that this is not always a great idea and that tidy form data is more universally useful, but there are still some scenarios where this behaviour is desirable.

I have seen a few solutions on here, but they all seem to involve string generation and then manually pasting the query... I can do this via the BigQuery API but am desperate to find a dynamic solution using nothing but SQL so I don't have to maintain an external function.

Thanks in advance for any pointers!

JB_80
  • 57
  • 1
  • 7
  • Putting the names in rows rather than columns seems better for downstream tools. – Gordon Linoff Mar 27 '20 at 15:00
  • not possible for view! doable for just query though – Mikhail Berlyant Mar 27 '20 at 15:05
  • @MikhailBerlyant https://stackoverflow.com/questions/40761453/transpose-rows-into-columns-in-bigquery-pivot-implementation says you cannot do it in bq, do you mean that (if not a view) you can actually do that without string manipulation? if so, could you propose a way to do it? – Albert Albesa Apr 16 '20 at 07:19
  • 1
    @AlbertAlbesa - this question is for view. if you have your own question - please post it with all details and we will help you – Mikhail Berlyant Apr 16 '20 at 07:39
  • Thanks all, whilst I totally agree that it's better to have tidy data, my clients often have requirements like this so it's a common problem. Felipe Hoffa (https://stackoverflow.com/users/132438/felipe-hoffa) just posted an example of the opposite transformation (unpivot), using just BigQuery (via an SQL UDF) here: https://towardsdatascience.com/how-to-unpivot-multiple-columns-into-tidy-pairs-with-sql-and-bigquery-d9d0e74ce675, which might be of use to people viewing this question. For the record I think we're going to try and use DBT (https://www.getdbt.com/) instead which looks very cool! – JB_80 Apr 22 '20 at 13:43

0 Answers0