1

Whem I am running the following query, I get a 'resource limited exceeded'-error. If I remove the last line (the order by clause) it works:

SELECT
  id,
  INTEGER(-position / (CASE WHEN fallback = 0 THEN 2 ELSE 1 END)) AS major_sort
FROM (
  SELECT
    id,
    fallback,
    ROW_NUMBER() OVER(PARTITION BY fallback) AS position
  FROM
    [table] AS r
  ORDER BY
    r.score DESC ) AS r
ORDER BY major_sort DESC

Actually the entire last line would be:

ORDER BY major_sort DESC, r.score DESC

But neither that would probably make things even worse.

Any idea how I could change the query to circumvent this problem?

((If you wonder what this query does: the table contains a 'ranking' with multiple fallback strategies and I want to create an ordering like this: 'AABAABAABAAB' with 'A' and 'B' being the fallback strategies. If you have a better idea how to achieve this; please feel free to tell me :D))

Nils Ziehn
  • 4,118
  • 6
  • 26
  • 40

1 Answers1

2

A top-level ORDER BY will always serialize execution of your query: it will force all computation onto a single node for the purpose of sorting. That's the cause of the resources exceeded error.

I'm not sure I fully understand your goal with the query, so it's hard to suggest alternatives, but you might consider putting an ORDER BY clause within the OVER(PARTITION BY ...) clause. Sorting a single partition can be done in parallel and may be closer to what you want.

More general advice on ordering:

  • Order is not preserved during BQ queries, so if there's an ordering that you want to preserve on the input rows, make sure it's encoded in your data as an extra field.

  • The use cases for large amounts of globally-sorted data are somewhat limited. Often when users run into resource limitations with ORDER BY, we find that they're actually looking for something slightly different (locally ordered data, or "top N"), and that it's possible to get rid of the global ORDER BY completely.

Jeremy Condit
  • 6,766
  • 1
  • 28
  • 30
  • Thx for the answer. Unfortunately we do need the entire set sorted, but I cloud also in post processing. Do you know whether there are any plans to show users 'how close' we are to the resource limit? I'm a little scared that the query works for now, but as soon as our business expands our queries won't work anymore because we hit the limits. – Nils Ziehn Sep 10 '15 at 14:37
  • 2
    How much data are we talking about, and why does it need to be completely sorted? (And to answer your question, there are no current plans to report how close you are to the limit, but we'll keep the feature request in mind.) – Jeremy Condit Sep 10 '15 at 15:57
  • The last SELECT statement is supposed to sort about 6mio rows and 10 colums (6doubles, 4 short strings) – Nils Ziehn Sep 10 '15 at 17:00
  • 1
    That's likely getting close to the limit of what we can sort with a single ORDER BY. Why do you need the data sorted? What do you do with the sorted data that comes out of this query? – Jeremy Condit Sep 10 '15 at 20:49
  • We will have to change some post processing algos, but I guess that's fine. What I don't get: When I just use ROW_NUMBER() OVER(ORDER BY ...) in the top level query the data is actually return sorted in the correct order but we don't hit resource limits. I would have assumed this would require even more resources. – Nils Ziehn Sep 10 '15 at 21:03
  • 1
    It's possible that ROW_NUMBER() OVER (ORDER BY ...) uses a slightly different code path that is more memory-efficient. But if you don't have a PARTITION BY clause, then the sorting is still being done on a single node, and you'll hit a resource limit sooner or later. – Jeremy Condit Sep 10 '15 at 22:15