1

I am trying to write a BigQuery SQL function / stored procedure / table function that accepts as input:

  • a INT64 filter for the WHERE clause,
  • a table name (STRING type) as fully qualified name e.g. project_id.dataset_name.table_name

The idea is to dynamically figure out the table name and provide a filter to slice the data to return as a table.

However if try to write a Table Function (TVF) and I use SET to start dynamically writing the SQL to execute, then I see this error:

Syntax error: Expected "(" or keyword SELECT or keyword WITH but got keyword SET at [4:5]

If I try to write a stored procedure, then it expects BEGIN and END and throws this error:

Syntax error: Expected keyword BEGIN or keyword LANGUAGE but got keyword AS at [3:1]

If I try to add those, then I get various validation errors basically because I need to remove the WITH using CTEs (Common Table Expression), and semicolons ; etc.

But what I am really trying to do is using a table function:

  1. to combine some CTEs dynamically with those inputs above (e.g. the input table name),
  2. to PIVOT that data,
  3. to then eventually return a table as a result of a SELECT.

A bit like producing a View that could be used in other SQL queries, but without creating the view (because the slice of data can be decided dynamically with the other INT64 input filter).

Once I dynamically build the SQL string I would like to EXECUTE IMMEDIATE that SQL and provide a SELECT as a final step of the table function to return the "dynamic table".

The thing is that:

  • I don't know before runtime the name of this table.
  • But I have all these tables with the same structure, so the SQL should apply to all of them.

Is this possible at all?

This is the not-so-working SQL I am trying to work around. See what I am trying to inject with %s and num_days:

CREATE OR REPLACE TABLE FUNCTION `my_dataset.my_table_func_name`(num_days INT64, fqn_org_table STRING)
AS (
    -- this SET breaks !!!

    SET f_query = """
    WITH report_cst_t AS (
        SELECT
            DATE(start) as day,
            entity_id,
            conn_sub_type,
        FROM `%s` AS oa
        CROSS JOIN UNNEST(oa.connection_sub_type) AS conn_sub_type
        WHERE
            DATE(start) > DATE_SUB(CURRENT_DATE(), INTERVAL num_days DAY)
            AND oa.entity_id IN ('my-very-long-id')
        ORDER BY 1, 2 ASC
    ),
    cst AS (
        SELECT * FROM
            (SELECT day, entity_id, report_cst_t FROM report_cst_t)
            PIVOT (COUNT(*) AS connection_sub_type FOR report_cst_t.conn_sub_type IN ('cat1', 'cat2','cat3' ))
    )
    """;

    -- here I would like to EXECUTE IMMEDIATE !!!

    SELECT
            cst.day,
            cst.entity_id,
            cst.connection_sub_type_cat1 AS cst_cat1,
            cst.connection_sub_type_cat2 AS cst_cat2,
            cst.connection_sub_type_cat3 AS cst_cat3,
    FROM cst
    ORDER BY 1, 2 ASC
);
TPPZ
  • 4,447
  • 10
  • 61
  • 106

1 Answers1

1

This might not be satisfying but since Procedural language or DDL are not allowed inside Table functions currently, one possible way around would be simply using PROCEDURE like below.

CREATE OR REPLACE PROCEDURE my_dataset.temp_procedure(filter_value INT64, table_name STRING)
BEGIN
    EXECUTE IMMEDIATE FORMAT(CONCAT(
        "SELECT year, COUNT(1) as record_count, ",
        "FROM %s                                ",
        "WHERE year = %d                        ",
        "GROUP BY year                          ",
        ";                                      "
    ), table_name, filter_value);
END;

CALL my_dataset.temp_procedure(2002, 'bigquery-public-data.usa_names.usa_1910_current');
Jiho Choi
  • 1,083
  • 1
  • 10
  • 26