I'm developing what is turning out to be quite a complex query that requires me to stack data (i.e UNION ALL
) many times. To my surprise BigQuery doesn't like the stacking and the dry run shows exception:
Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex.
I've isolated the point in the query where the problem arises to confirm that it appears to be one too many UNION ALL
causing the problem. I'm surprised that the UNION ALL
would do this, but I suspect I'm naive in my thinking here.
Why isn't BigQuery able to handle this additional
UNION ALL
? Isn't stacking data one of the more straight forward operations?What are my options to achieve the same result? Is there an operation that I'm not aware of that could do the same job or an alternative method?
Here's the query in full, although I should note that project.dataset.source_view
does do some relatively straight forward processing first:
WITH p0_funnel AS (
SELECT
date,
platform_type,
platform,
flow,
step_1,
step_2,
step_3,
step_4,
step_5,
step_6
FROM `project.dataset.source_view`
), p1_funnel AS (
SELECT
date,
flow,
platform_type,
platform,
SUM(step_1) AS step_1,
SUM(step_2) AS step_2,
SUM(step_3) AS step_3,
SUM(step_4) AS step_4,
SUM(step_5) AS step_5,
SUM(step_6) AS step_6
FROM p0_funnel
GROUP BY
date,
flow,
platform_type,
platform
), p2_funnel AS (
SELECT
date,
flow,
platform,
platform_type,
step_1,
step_2,
step_3,
step_4,
step_5,
step_6
FROM p1_funnel
), p3_funnel AS (
SELECT
date, platform, platform_type, flow,
'step_1' AS step,
step_1 AS step_sessions
FROM p1_funnel
UNION ALL
SELECT
date, platform, platform_type, flow,
'step_2' AS step,
step_2 AS step_sessions
FROM p1_funnel
UNION ALL
SELECT
date, platform, platform_type, flow,
'step_3' AS step,
step_3 AS step_sessions
FROM p1_funnel
UNION ALL
SELECT
date, platform, platform_type, flow,
'step_4' AS step,
step_4 AS step_sessions
FROM p1_funnel
UNION ALL
SELECT
date, platform, platform_type, flow,
'step_5' AS step,
step_5 AS step_sessions
FROM p1_funnel
UNION ALL
SELECT
date, platform, platform_type, flow,
'step_6' AS step,
step_6 AS step_sessions
FROM p1_funnel
), p4_funnel AS (
SELECT
main.date,
platform, platform_type, flow,
step,
step_1,
step_2,
step_3,
step_4,
step_5,
step_6,
step_sessions
FROM p3_funnel AS main
JOIN p2_funnel USING(date, platform, platform_type, flow)
), funnel_platform_type AS (
SELECT
date,
'platform_type' AS dimension,
platform_type AS value,
step,
step_1,
step_2,
step_3,
step_4,
step_5,
step_6,
step_sessions
FROM p4_funnel
), funnel_platform AS (
SELECT
date,
'platform' AS dimension,
platform AS value,
step,
step_1,
step_2,
step_3,
step_4,
step_5,
step_6,
step_sessions
FROM p4_funnel
), funnel_flow AS (
SELECT
date,
'flow' AS dimension,
flow AS value,
step,
step_1,
step_2,
step_3,
step_4,
step_5,
step_6,
step_sessions
FROM p4_funnel
), p5_funnel AS (
SELECT * FROM funnel_platform_type UNION ALL
SELECT * FROM funnel_platform UNION ALL
SELECT * FROM funnel_flow # including this UNION ALL first introduces the problem
)
SELECT
date,
dimension,
ROW_NUMBER() OVER (PARTITION BY dimension, step ORDER BY step_1 DESC) AS dim_order,
value,
step,
CASE
WHEN step = 'step_1' THEN 1
WHEN step = 'step_2' THEN 2
WHEN step = 'step_3' THEN 3
WHEN step = 'step_4' THEN 4
WHEN step = 'step_5' THEN 5
WHEN step = 'step_6' THEN 6
ELSE null
END AS step_order,
CASE
WHEN step = 'step_1' THEN step_2
WHEN step = 'step_2' THEN step_3
WHEN step = 'step_3' THEN step_4
WHEN step = 'step_4' THEN step_5
WHEN step = 'step_5' THEN step_6
WHEN step = 'step_6' THEN null
ELSE null
END AS next_step_sessions,
step_1,
step_2,
step_3,
step_4,
step_5,
step_6,
step_sessions
FROM p5_funnel