10

I am using Standard SQL.Even though its a basic query it is still throwing errors. Any suggestions pls

SELECT 
  fullVisitorId,
  CONCAT(CAST(fullVisitorId AS string),CAST(visitId AS string)) AS session,
  date,
  visitStartTime,
  hits.time,
  hits.page.pagepath
FROM
  `XXXXXXXXXX.ga_sessions_*`,
  UNNEST(hits) AS hits
WHERE
  _TABLE_SUFFIX BETWEEN "20160801"
  AND "20170331"
ORDER BY
  fullVisitorId,
  date,
  visitStartTime
Community
  • 1
  • 1
HKE
  • 473
  • 1
  • 4
  • 16

3 Answers3

12

The only way for this query to work is by removing the ordering applied in the end:

SELECT 
  fullVisitorId,
  CONCAT(CAST(fullVisitorId AS string),CAST(visitId AS string)) AS session,
  date,
  visitStartTime,
  hits.time,
  hits.page.pagepath
FROM
  `XXXXXXXXXX.ga_sessions_*`,
  UNNEST(hits) AS hits
WHERE
  _TABLE_SUFFIX BETWEEN "20160801"
  AND "20170331"

ORDER BY operation is quite expensive and cannot be processed in parallel so try to avoid it (or try applying it in a limited result set)

Willian Fuks
  • 11,259
  • 10
  • 50
  • 74
  • Thanks Willian. It's working, but can you tell me the reason why it was not working when I use order by. – HKE Sep 01 '17 at 21:30
  • 4
    There were too many rows to hold in memory on a single node. If you look at the "Explanation" tab for the query, it will show where it ran out of memory. – Elliott Brossard Sep 02 '17 at 09:08
  • Thanks @ElliottBrossard – HKE Sep 05 '17 at 12:30
  • 1
    I encountered the same issue. Even weirder, the query succeeded in the web UI but not in the python API. Removed ORDER BY clauses solved the issue but it's a bit weird to experience the discrepancy. – Rutger Hofste Jul 31 '18 at 13:58
  • I know this isa bit old, but does `OVER (PARTITION BY ...` has the same effect? – Islam Azab Jan 28 '21 at 18:38
3

Besides the accepted answer, you might want to partition your table by date to lessen the amount of memory used with an expensive query.

Embedded_Mugs
  • 2,132
  • 3
  • 21
  • 33
  • 1
    The above query is to pull the GA data and by default it is partitioned by date. _TABLE_SUFFIX BETWEEN "20160801" AND "20170331" This is how I am pulling data from different date ranges – HKE Feb 12 '18 at 11:53
1

To avoid gathering big chunk into a slot, you can try:

  1. Split the data into small chunks before querying,
  2. Use a LIMIT clause with an ORDER BY operation, or
  3. Remove the ORDER BY operation from the query.

Please refer to the GCP documentations regarding the error:
[1] https://cloud.google.com/bigquery/docs/best-practices-performance-output#use_a_limit_clause_with_large_sorts
[2] https://cloud.google.com/bigquery/docs/error-messages#resourcesExceeded

Jeremy Caney
  • 7,102
  • 69
  • 48
  • 77
jaemunbro
  • 36
  • 1