7

I'm developing what is turning out to be quite a complex query that requires me to stack data (i.e UNION ALL) many times. To my surprise BigQuery doesn't like the stacking and the dry run shows exception:

Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex.

I've isolated the point in the query where the problem arises to confirm that it appears to be one too many UNION ALL causing the problem. I'm surprised that the UNION ALL would do this, but I suspect I'm naive in my thinking here.

  1. Why isn't BigQuery able to handle this additional UNION ALL? Isn't stacking data one of the more straight forward operations?

  2. What are my options to achieve the same result? Is there an operation that I'm not aware of that could do the same job or an alternative method?

Here's the query in full, although I should note that project.dataset.source_view does do some relatively straight forward processing first:

WITH p0_funnel AS (
  SELECT  
    date, 
    platform_type, 
    platform, 
    flow,
    step_1, 
    step_2, 
    step_3, 
    step_4, 
    step_5, 
    step_6
  FROM `project.dataset.source_view`
), p1_funnel AS (
  SELECT
    date,
    flow,
    platform_type,
    platform,
    SUM(step_1) AS step_1, 
    SUM(step_2) AS step_2, 
    SUM(step_3) AS step_3, 
    SUM(step_4) AS step_4, 
    SUM(step_5) AS step_5, 
    SUM(step_6) AS step_6
  FROM p0_funnel
  GROUP BY 
    date, 
    flow,
    platform_type,
    platform
), p2_funnel AS (
  SELECT
    date,
    flow,
    platform,
    platform_type,
    step_1,
    step_2,
    step_3,
    step_4,
    step_5,
    step_6
  FROM p1_funnel
), p3_funnel AS (
  SELECT
    date, platform, platform_type, flow,
    'step_1' AS step,
    step_1 AS step_sessions
  FROM p1_funnel

  UNION ALL

  SELECT
    date, platform, platform_type, flow,
    'step_2' AS step,
    step_2 AS step_sessions
  FROM p1_funnel

  UNION ALL

  SELECT
    date, platform, platform_type, flow,
    'step_3' AS step,
    step_3 AS step_sessions
  FROM p1_funnel

  UNION ALL

  SELECT
    date, platform, platform_type, flow,
    'step_4' AS step,
    step_4 AS step_sessions
  FROM p1_funnel

  UNION ALL

  SELECT
    date, platform, platform_type, flow,
    'step_5' AS step,
    step_5 AS step_sessions
  FROM p1_funnel

  UNION ALL

  SELECT
    date, platform, platform_type, flow,
    'step_6' AS step,
    step_6 AS step_sessions
  FROM p1_funnel
), p4_funnel AS (
  SELECT
    main.date,
    platform, platform_type, flow,
    step,
    step_1,
    step_2,
    step_3,
    step_4,
    step_5,
    step_6,
    step_sessions
  FROM p3_funnel AS main
  JOIN p2_funnel USING(date, platform, platform_type, flow)

), funnel_platform_type AS (
  SELECT
    date,
    'platform_type' AS dimension,
    platform_type AS value,
    step,
    step_1,
    step_2,
    step_3,
    step_4,
    step_5,
    step_6,
    step_sessions
  FROM p4_funnel
), funnel_platform AS (
  SELECT
    date,
    'platform' AS dimension,
    platform AS value,
    step,
    step_1,
    step_2,
    step_3,
    step_4,
    step_5,
    step_6,
    step_sessions
  FROM p4_funnel
), funnel_flow AS (
  SELECT
    date,
    'flow' AS dimension,
    flow AS value,
    step,
    step_1,
    step_2,
    step_3,
    step_4,
    step_5,
    step_6,
    step_sessions
  FROM p4_funnel
), p5_funnel AS (
  SELECT * FROM funnel_platform_type UNION ALL
  SELECT * FROM funnel_platform UNION ALL
  SELECT * FROM funnel_flow # including this UNION ALL first introduces the problem
)

SELECT
  date,
  dimension,
  ROW_NUMBER() OVER (PARTITION BY dimension, step ORDER BY step_1 DESC) AS dim_order,
  value,
  step,
  CASE
    WHEN step = 'step_1' THEN 1
    WHEN step = 'step_2' THEN 2
    WHEN step = 'step_3' THEN 3
    WHEN step = 'step_4' THEN 4
    WHEN step = 'step_5' THEN 5
    WHEN step = 'step_6' THEN 6
    ELSE null
  END AS step_order,  
  CASE
    WHEN step = 'step_1' THEN step_2
    WHEN step = 'step_2' THEN step_3
    WHEN step = 'step_3' THEN step_4
    WHEN step = 'step_4' THEN step_5
    WHEN step = 'step_5' THEN step_6
    WHEN step = 'step_6' THEN null
    ELSE null
  END AS next_step_sessions,
  step_1,
  step_2,
  step_3,
  step_4,
  step_5,
  step_6,
  step_sessions
FROM p5_funnel
Steve-o169
  • 2,066
  • 1
  • 12
  • 21
goose
  • 2,502
  • 6
  • 42
  • 69
  • I would try to union each subquery one by one to find out which subquery Bigquery complains about. Then you can isolate the problem and ask the question again. – lovechillcool Dec 18 '18 at 17:40
  • 1
    @lovechillcool it's seemingly un-picky about which UNION ALL. Removing any solves the problem. – goose Dec 18 '18 at 17:48
  • If you have access to GCP support post this there as well. – Pentium10 Dec 18 '18 at 20:06
  • Thanks @Pentium10 I'll double check with our G-Suite admin, but I'm pretty sure we don't have GCP support. – goose Dec 18 '18 at 20:31
  • 1
    @Pentium10 think I've cracked the work around aspect of this (I've posted an answer to show what I did). If you're able shed any light on the "why it fails" part, I, and I'm sure others, would be curious to know at least at high level why it fails. Not sure if that's possible... – goose Dec 18 '18 at 22:40

3 Answers3

4

It is suggested to use temporary tables instead of a lot of WITH clauses, etc.. Breaking the query into a few simpler queries, and persisting the intermediate results into short-term tables or temporary tables should help to resolve this error.

The WITH clause contains one or more named subqueries which execute every time a subsequent SELECT statement references them. Any clause or subquery can reference subqueries you define in the WITH clause. This includes any SELECT statements on either side of a set operator, such as UNION.

The WITH clause is useful primarily for readability, because BigQuery does not materialize the result of the queries inside the WITH clause. If a query appears in more than one WITH clause, it executes in each clause.

  • 2
    Thanks for your answer. Can you explain a little further what you mean by temp tables? Is that saving interim results to a physical table set with expiry, or is there a diff syntax to use? – goose Jul 21 '20 at 11:52
  • 1
    @Goose: Bigquery introduced scripting recently which helps us in creating temporary tables. The life span of the temporary tables will be in the scope that script only. This link will help in creation of temporary tables: https://stackoverflow.com/questions/20673986/how-to-create-temporary-table-in-google-bigquery – Sridhar Pothamsetti Jul 26 '20 at 18:11
1

This happened because of the BigQuery subqueries quota. Not because of "UNION All". I faced the same problem when trying to execute a query with more than 125+ subqueries. So try to partition your query by subqueries count and insert every part in a temp table, then collect data from temp tables and drop them when you will finish.  

0

I think I've found an answer to (2).

I'm still un clear on (1) and I'm still interested. I can only presume it's due to the inner workings of BigQuery.

My solution to (1) uses a different query strategy. In a sense it separates the heavy lifting from the gymnastics by first collecting a grid of required dimensions using CROSS JOIN and then secondly simply JOIN to the dataset and trim down what's not required.

Here's what that looks like:

WITH p0_funnel AS (
  SELECT  
    date, 
    platform_type, 
    platform, 
    flow, 
    step_1, 
    step_2, 
    step_3, 
    step_4, 
    step_5, 
    step_6s AS step_6
  FROM `project.dataset.source`
), p1_funnel AS (
  SELECT
    flow,
    platform_type,
    platform,
    SUM(step_1) AS step_1, 
    SUM(step_2) AS step_2, 
    SUM(step_3) AS step_3, 
    SUM(step_4) AS step_4, 
    SUM(step_5) AS step_5, 
    SUM(step_6) AS step_6
  FROM p0_funnel
  GROUP BY 
    flow,
    platform_type,
    platform
), dimension_values AS (
  SELECT DISTINCT
    flow,
    platform_type,
    platform  
  FROM p1_funnel
), dimension_names AS (
  SELECT 'platform_type' AS dimension UNION ALL
  SELECT 'platform' UNION ALL
  SELECT 'flow'
), steps AS (
  SELECT 'step_1' AS step UNION ALL
  SELECT 'step_2' UNION ALL
  SELECT 'step_3' UNION ALL
  SELECT 'step_4' UNION ALL
  SELECT 'step_5' UNION ALL
  SELECT 'step_6'
), full_grid AS (
  SELECT
    dimension,
    step,
    flow,
    platform_type,
    platform  
  FROM dimension_values
  CROSS JOIN dimension_names
  CROSS JOIN steps
)

SELECT
  dimension,
  ROW_NUMBER() OVER (PARTITION BY dimension, step ORDER BY step_1 DESC) AS dim_order,
  CASE
    WHEN dimension = 'platform_type' THEN platform_type
    WHEN dimension = 'platform' THEN platform
    WHEN dimension = 'flow' THEN flow
  ELSE NULL END AS dim_value,
  step,  
  CASE
    WHEN step = 'step_1' THEN 1
    WHEN step = 'step_2' THEN 2
    WHEN step = 'step_3' THEN 3
    WHEN step = 'step_4' THEN 4
    WHEN step = 'step_5' THEN 5
    WHEN step = 'step_6' THEN 6
    ELSE null
  END AS step_order,  
  CASE
    WHEN step = 'step_1' THEN step_1
    WHEN step = 'step_2' THEN step_2
    WHEN step = 'step_3' THEN step_3
    WHEN step = 'step_4' THEN step_4
    WHEN step = 'step_5' THEN step_5
    WHEN step = 'step_6' THEN step_6
    ELSE null
  END AS step_sessions,   
  CASE
    WHEN step = 'step_1' THEN step_2
    WHEN step = 'step_2' THEN step_3
    WHEN step = 'step_3' THEN step_4
    WHEN step = 'step_4' THEN step_5
    WHEN step = 'step_5' THEN step_6
    WHEN step = 'step_6' THEN null
    ELSE null
  END AS next_step_sessions 
FROM full_grid
JOIN p1_funnel USING(platform_type, platform, flow)

CROSS JOIN seems to be very useful when used in the right place, although I'm still none too sure why UNION ALL couldn't have done the job. Self-evidently there are technical reasons why this is the case, I'm intellectually curious to know why, but at least there's a work around for this case and others like it.

goose
  • 2,502
  • 6
  • 42
  • 69