1

we have a data set in Big Query with more than 500000 tables, when we run queries against this data set using legacy SQL, its throwing an error

As per Jordan Tigani, it executes SELECT table_id FROM .TABLES_SUMMARY to get relevant tables to query How do I use the TABLE_QUERY() function in BigQuery?

Does queries using _TABLE_SUFFIX(standard SQL) executes TABLES_SUMMARY to get relevant tables to query?

SaiVijay
  • 15
  • 1
  • 4
  • 1
    "its throwing an error" - what query, what error? – Felipe Hoffa May 14 '20 at 08:15
  • Hi Felipe, It was throwing the below error Reason: resourcesExceeded, Message: Resources exceeded during query execution: Too many tables in dataset metatable :.__TABLES_SUMMARY__, exceeded 500000 limit. – SaiVijay May 14 '20 at 08:52

2 Answers2

3

According to the documentation TABLE_SUFFIX is a pseudo column that contains the values matched by the table wildcard and it is olny available in StandardSQL. Meanwhile, __TABLE_SUMMARY_ is a meta-table that contains information about the tables within a dataset and it is available in Standard and Legacy SQL. Therefore, they have two different concepts.

However, in StandardSQL, you can use INFORMATION_SCHEMA.TABLES to retrieve information about the tables within the chosen dataset, similarly to __TABLE_SUMMARY_. Here you can find examples of usage and also its limitations.

Below, I queried against a public dataset using both methods:

First, using INFORMATION_SCHEMA.TABLES.

SELECT * FROM `bigquery-public-data.noaa_gsod.INFORMATION_SCHEMA.TABLES` 

And part of the output:

enter image description here

Secondly, using __TABLES_SUMMARY__.

SELECT * FROM `bigquery-public-data.noaa_gsod.__TABLES_SUMMARY__`

And part of the output table,

enter image description here

As you can see, for each method the output has a particular. Even though, both retrieve metadata about the tables within a particular dataset.

NOTE: BigQuery's queries have quotas. This quotas applies for some situations, including for the number of tables a single query can reference, which is 1000 per query, here.

Alexandre Moraes
  • 3,892
  • 1
  • 6
  • 13
  • Hi Alexandre, thanks for responding. but its not answering my question. when you run queries on legacy using **TABLE_QUERY**, by default it will execute __TABLES_SUMMARY__ to get table_ids and then run actual query. means extra query will run internally similarly, when you run queries on standard using **_TABLE_SUFFIX**, is there any extra query will run to get table_ids(like as you have given INFORMATION_SCHEMA.TABLES)?? – SaiVijay May 14 '20 at 12:49
  • Hi @SaiVijay, in StandardSQL, `INFORMATION_SCHEMA.TABLES` retrieves the metadata for each table inside a dataset. When you run it you can check the job execution details to inspect how the job behaves. In addition, you can also see how much data it will scan using this method. Meanwhile, _TABLE_SUFFIX does not give you any metadata about the table, it just selects/returns the wild all the tables matched by the wildcard. If you run `SELECT _TABLE_SUFFIX as suffix FROM bigquery-public-data.noaa_gsod.*`, it won't scan any data because it is a pseudo column. Did I answer your question? – Alexandre Moraes May 14 '20 at 14:10
0

No, querying using wildcard table does not execute TABLES_SUMMARY. You can have more than 500k tables in the dataset, but it does require that the number of tables matching the prefix pattern to be less than 500k. For other limitations on wildcard tables you can refer to the documentation.

Hua Zhang
  • 1,521
  • 11
  • 11