1

Analytical Function Scalability in BigQuery

Quite a big setback our team has for transitioning to BigQuery is the scalability issues over analytical functions. I've read through numerous replies here, and they either say it cannot be done since sorting must be done on a single slot [1][2][3], to PARTITION [1][2][3], or to LIMIT [1][2][3][4][5].

Problem for us is that we need to be capable of doing this without partitioning and without limiting. We'd very much prefer not to have to export our data, do the analytical function in a different program, and then upload the data back to BigQuery each time.

Questions:

  • Is there a road map for improving the scalability of analytical functions in BigQuery?
  • Is it possible to allot more memory so that more data can fit on a single node?

Example:
Table size: 3.76 GB
Two columns: UNIQUE_ID & SALES
140 million rows

    SELECT 
        UNIQUE_ID
        , SALES
        , NTILE(10) OVER (ORDER BY SALES ASC) AS SALES_DECILE
    FROM `project.dataset.table`

Error: Resources exceeded during query execution: The query could not be executed in the allotted memory. OVER() operator used too much memory.

For my purposes, not having exactly 10% of records in each decile is okay, so APPROX_QUANTILES() does the trick, but I'm not sure if that's okay with other teams. The questions still stand when it comes to other analytical functions that need the exact amount. For example, I need to ROW_NUMBER() OVER() every record in a 140 million row table with one unique ID attribute that isn't numeric, so I'm unable to PARTITION at all. Otherwise, I'd totally do this suggestion.

Community
  • 1
  • 1
E Andren
  • 19
  • 2
  • usually, in such cases, my recommendation is to revisit the approach and think about why one would really need it? most likely this is not the end result but rather intermediate step to something else - so can one think of another ways of getting that result, etc. ...just saying ... - but obviously not having resource/memory limitation is good things to dream about :o) – Mikhail Berlyant Sep 26 '18 at 22:45
  • For feature requests and future plan discussions, please use the BigQuery issue tracker (https://issuetracker.google.com/savedsearches/559654). To solve "I need to ROW_NUMBER() OVER() every record in a 140 million row table with one unique ID attribute that isn't numeric" please post a new question with enough specifics so the community can help with that. – Felipe Hoffa Sep 27 '18 at 05:31

0 Answers0