Google BigQuery IF/While Loop

Question

DECLARE var1 INT64;
SET var1=(select * from abc.xyz);
{SOME OTHER OPERATIONS}

ERROR : variable quota exceeded.

To solve this issue, I want to run a batch Process where the count is taken from abc.xyz table and only 50000 records are processed in first batch and the result is stored in temp table. In next iteration the loop processes another 50000 and add them to the temp table.

How can this be done in google bigquery?

score 4 · Accepted Answer · answered Feb 10 '20 at 23:02

The error was because the limit size for a variable is 1MB.

Regarding the loop to process batches. You can use the following SQL as a reference:

DECLARE offset_ INT64 DEFAULT 1; -- OFFSET starts in 1 BASED on ROW NUMBER ()
DECLARE limit_ INT64 DEFAULT 500; -- Size of the chunks to be processed
DECLARE size_ INT64 DEFAULT 7000; -- Size of the data (used for the condition in the WHILE loop)

-- Table to be processed. I'm creating this new temporary table to use it as an example
CREATE TEMPORARY TABLE IF NOT EXISTS data_numbered AS (

  SELECT *, ROW_NUMBER() OVER()  row_number
    FROM (SELECT * FROM `bigquery-public-data.stackoverflow.users` LIMIT 7000)
);

-- WHILE loop
WHILE offset_ < size_ DO

    IF offset_ = 1 THEN -- OPTIONAL, create the temporary table in the first iteration

      CREATE OR REPLACE TEMPORARY TABLE temp_table AS (    
        SELECT * FROM data_numbered
          WHERE row_number BETWEEN offset_ AND offset_ + limit_ - 1 -- Use offset and limit to control the chunks of data 
       );

    ELSE

     -- This is the same query as above.
     -- Each iteration will fill the temporary table
     -- Iteration
     -- 501 - 1000
     -- 1001 - 1500
     -- ...

      INSERT INTO temp_table (
        SELECT * FROM data_numbered WHERE row_number BETWEEN offset_ AND offset_ + limit_ - 1 -- -1 because BETWEEN is inclusive, so it helps to avoid duplicated values in the edges
      );

    END IF;

    -- Adjust the offset_ variable
    SET offset_ = offset_ + limit_;

END WHILE;

One of the challenges while making this loop is that you cannot use variables in the LIMIT and the OFFSET clause. So, I used ROW_NUMBER() to create a column that I could use to process with the WHERE clause:

 WHERE row_number BETWEEN offset_ AND offset_ + limit_

If you want to read more about ROW_NUMBER() I recommend to check this SO answer.

Finally, if you want to use this approach, consider that there are some caveats like scripting being a Beta feature, and possible quota issues depending on how often you insert data into your temporary table. Also, since the query changes in each iteration, the first time you run it, and it is not cached, the bytes_processed will be number_of_iterations*byte_size of the table

Google BigQuery IF/While Loop

1 Answers1