0

I am working with BigQuery. I have two tables:

organisations:
  org_code STRING
  name STRING

spending:
  org STRING
  month DATE
  quantity INTEGER
  code STRING

And then quite a complicated query to get results by each organisation, by month:

SELECT
  organisations.org_code AS org,
  num.month AS month,
  (num.quantity / denom.quantity) AS ratio_quantity
FROM (
  SELECT
    org_code, name
  FROM
    [mytable.organisations]) AS organisations
LEFT OUTER JOIN EACH (
  SELECT
    org,
    month,
    SUM(quantity) AS quantity
  FROM
    [mytable.spending]
  GROUP BY
    org,
    month) AS denom
ON
  denom.org = organisations.org_code
LEFT OUTER JOIN EACH (
  SELECT
    org,
    month,
    SUM(quantity) AS quantity
  FROM
    [hscic.spending]
  WHERE
    code LIKE 'XXXX%'
  GROUP BY
    org,
    month) AS num
ON
  denom.month = num.month
  AND denom.org = num.org
ORDER BY org, month

My final results look like this, with a row per org/month combination:

org,month,ratio_quantity
A81001,2015-10-01 00:00:00 UTC,28
A82001,2015-11-01 00:00:00 UTC,43
A82002,2015-10-01 00:00:00 UTC,16

Now I would like to pivot the results to look like this, with one row per month, and one column per organisation:

month,items.A81001,items.A82002...
2015-10-01 00:00:00 UTC,28,16
2015-11-01 00:00:00 UTC,43,...

Is this possible in the same BigQuery call? Or should I create a new table and pivot it from there? Or should I just do the reshaping in Python?

UPDATE: There are about 500,000 results, fyi.

Richard
  • 62,943
  • 126
  • 334
  • 542
  • A query that mimics the pandas.melt() function without using UDFs: https://stackoverflow.com/a/62057616/13632099 – Luna Huang May 28 '20 at 05:31

1 Answers1

1

Q. Is this possible in the same BigQuery call? Or should I create a new table and pivot it from there?

In general, you can use that “complicated query” as a subquery for extra logic to be applied to your current result. So, it is definitely doable. But code can quickly become un-manageable or hard to manage – so you can consider writing this result into new table and then pivot it from there

If you stuck with direction of doing pivot (the way you described in your question) - check below link to see detailed intro on how you can implement pivot within BigQuery.
How to scale Pivoting in BigQuery?
Please note – there is a limitation of 10K columns per table - so you are limited with 10K organizations.
You can also see below as simplified examples (if above one is too complex/verbose):
How to transpose rows to columns with large amount of the data in BigQuery/SQL?
How to create dummy variable columns for thousands of categories in Google BigQuery?
Pivot Repeated fields in BigQuery

Q. Or should I just do the reshaping in Python?

If above will not work for you – pivoting on client is always an option but now you should consider client side limitations

Hope this helped!

Community
  • 1
  • 1
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230