I'm working in BigQuery on Google Analytics data. At various points in developing the query I get the error: "Resources exceeded". I want to further my understanding of what's happening. I've successfully worked around the problem, but only via trial and error.
When I use the explain tool it seems to be the 'compute' part of any query or sub-query that looks to have exceeded resources.
Here's an example of a standard SQL query that succeeds/fails depending on whether certain parts are left in:
SELECT
fullVisitorId,
visitId,
h.type AS type,
h.hitNumber AS hitNumber,
h.eventInfo.eventAction AS action,
LOWER(h.eventInfo.eventCategory) AS category,
h.page.pagePath AS page,
h.page.pageTitle AS landingTitle,
h.page.searchKeyword AS searchTerm,
LEAD(h.page.pagePath) OVER (PARTITION BY fullVisitorId, visitId ORDER BY h.hitNumber ASC) AS landingPage,
SPLIT(h.eventInfo.eventLabel, ':')[OFFSET(0)] AS clickTitle,
CASE WHEN LEAD(h.page.pageTitle) OVER (PARTITION BY fullVisitorId, visitId ORDER BY h.hitNumber ASC) = SPLIT(h.eventInfo.eventLabel, ':')[OFFSET(0)] THEN true ELSE false END AS searchClick
FROM `project.dataset.ga_sessions_*` AS main, UNNEST(hits) AS h
WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170430'
AND (
(
h.eventInfo.eventAction = 'click' AND LOWER(h.eventInfo.eventCategory) LIKE '/search%'
)
OR type = 'PAGE'
)
ORDER BY
fullVisitorId ASC, visitId ASC, h.hitNumber ASC
When removing any one of these sets of elements the query runs:
ORDER BY
fullVisitorId ASC, visitId ASC, h.hitNumber ASC
Or:
LEAD(h.page.pagePath) OVER (PARTITION BY fullVisitorId, visitId ORDER BY h.hitNumber ASC) AS landingPage,
SPLIT(h.eventInfo.eventLabel, ':')[OFFSET(0)] AS clickTitle,
CASE WHEN LEAD(h.page.pageTitle) OVER (PARTITION BY fullVisitorId, visitId ORDER BY h.hitNumber ASC) = SPLIT(h.eventInfo.eventLabel, ':')[OFFSET(0)] THEN true ELSE false END AS searchClick
Or:
When running on a single date partition the entire query runs.
I would describe my current level of understanding as superficial, I know little of the inner workings of GBQ and how it allocates/permits compute resources. I do know that it performs calculations on separate machines where possible. I've heard these described as shards before.
What do I need to know about GBQ compute resources in order to understand why the above will work/not work?
N.B: I only have Tier 1 access, but that doesn't mean I can't gain increased access if I can justify a need. Obviously I don't want to do this with current level of understanding.