I am currently attempting to run the following SQL script in BigQuery (with the goal of saving the table out) but am getting a SQL processing error relating to the capacity after I start the query without a row return size limit. The error is the common one: "Error: Resources exceeded during query execution."
SELECT date,
Concat(fullvisitorid, String(visitid)) AS unique_visit_id,
visitid,
visitnumber,
fullvisitorid,
totals.pageviews,
totals.bounces,
hits.page.pagepath,
hits.page.pagetitle,
device.devicecategory,
device.browser,
device.browserversion,
hits.customvariables.index,
hits.customvariables.customvarname,
hits.customvariables.customvarvalue,
hits.time
FROM (Flatten([XXXXXXXX.ga_sessions_20140711], hits.time))
WHERE hits.customvariables.index = 4
ORDER BY unique_visit_id DESC,
hits.time ASC
The job ID that was provided for the job is: ua-web-analytics:job_60fxNnmo9gZ23kaji50v3qH9dXs. I have read the other posts on the topic of these errors such as this post which focuses on the resource errors observed completing a join. I suspect that the issue right now is with the use of FLATTEN, and am working through some different approaches. That said, I am concerned because, in future, this query may be run on 30 or 60 days together (versus just the single day that I am prototyping on right now) which will dramatically increase the data size to over 500GB to 1TB. The goal of the above query was to generate a table which I could save out and then operate on. Unfortunately, doing this in an Ad Hoc manner seems somewhat problematic. Has anyone else encountered resource constraints when using a similar SQL query? For context, the table that is being queried over is about 17.2 GB in size, with just over a million rows.