The error was because the limit size for a variable is 1MB.
Regarding the loop to process batches. You can use the following SQL as a reference:
DECLARE offset_ INT64 DEFAULT 1; -- OFFSET starts in 1 BASED on ROW NUMBER ()
DECLARE limit_ INT64 DEFAULT 500; -- Size of the chunks to be processed
DECLARE size_ INT64 DEFAULT 7000; -- Size of the data (used for the condition in the WHILE loop)
-- Table to be processed. I'm creating this new temporary table to use it as an example
CREATE TEMPORARY TABLE IF NOT EXISTS data_numbered AS (
SELECT *, ROW_NUMBER() OVER() row_number
FROM (SELECT * FROM `bigquery-public-data.stackoverflow.users` LIMIT 7000)
);
-- WHILE loop
WHILE offset_ < size_ DO
IF offset_ = 1 THEN -- OPTIONAL, create the temporary table in the first iteration
CREATE OR REPLACE TEMPORARY TABLE temp_table AS (
SELECT * FROM data_numbered
WHERE row_number BETWEEN offset_ AND offset_ + limit_ - 1 -- Use offset and limit to control the chunks of data
);
ELSE
-- This is the same query as above.
-- Each iteration will fill the temporary table
-- Iteration
-- 501 - 1000
-- 1001 - 1500
-- ...
INSERT INTO temp_table (
SELECT * FROM data_numbered WHERE row_number BETWEEN offset_ AND offset_ + limit_ - 1 -- -1 because BETWEEN is inclusive, so it helps to avoid duplicated values in the edges
);
END IF;
-- Adjust the offset_ variable
SET offset_ = offset_ + limit_;
END WHILE;
One of the challenges while making this loop is that you cannot use variables in the LIMIT and the OFFSET clause. So, I used ROW_NUMBER() to create a column that I could use to process with the WHERE clause:
WHERE row_number BETWEEN offset_ AND offset_ + limit_
If you want to read more about ROW_NUMBER() I recommend to check this SO answer.
Finally, if you want to use this approach, consider that there are some caveats like scripting being a Beta feature, and possible quota issues depending on how often you insert data into your temporary table. Also, since the query changes in each iteration, the first time you run it, and it is not cached, the bytes_processed will be number_of_iterations*byte_size of the table