I create daily tables. How to query the "newest" one?
For example GitHubArchive now publishes daily tables - instead of a monolithic one (see /r/bigquery/.../github_archive_changes_monthly_and_daily_tables/).
I create daily tables. How to query the "newest" one?
For example GitHubArchive now publishes daily tables - instead of a monolithic one (see /r/bigquery/.../github_archive_changes_monthly_and_daily_tables/).
To find the latest table, you could use BigQuery's table query functions:
For example, the latest daily GitHubArchive table:
SELECT COUNT(*) num_records
FROM TABLE_QUERY(githubarchive:day,
"table_id IN (
SELECT table_id FROM githubarchive:day.__TABLES__
ORDER BY creation_time DESC LIMIT 1)")
# 201859
For maximum convenience, you can store that query as a view to share with others. Then to query the latest table, just query the view:
SELECT num_records
FROM [fh-bigquery:public_dump.githubarchive_latest_day]
# 201859
For more on table query functions, see Jordan's answer to How do I use the TABLE_QUERY() function in BigQuery?.
Felipe's answer uses Legacy SQL. Here's a solution in Standard SQL.
If your table names have a predictable suffix like GitHub's example in your link (which uses a YYYYMMDD
suffix for each day), you can use wildcard tables and TABLE_SUFFIX
:
SELECT COUNT(*)
FROM `githubarchive.day.events_*`
WHERE _TABLE_SUFFIX = (
SELECT MAX(SUBSTR(table_id, -8))
FROM `githubarchive.day.__TABLES_SUMMARY__`
WHERE REGEXP_CONTAINS(table_id, "events_\\d{8}")
)
If you want to rely on BigQuery's metadata about creation time instead of the prefix, you can do:
SELECT COUNT(*)
FROM `githubarchive.day.*`
WHERE _TABLE_SUFFIX = (
SELECT table_id
FROM `githubarchive.day.__TABLES_SUMMARY__`
ORDER BY creation_time DESC
LIMIT 1
)
Note that if you have tables with different schemas in your project, the latter syntax is dangerous, since the query will use the schema of the first table it sees in the wildcard.
If you're trying to create tables that are partitioned by date, there's a BigQuery feature called date-partitioned tables that does just that, and makes it easier to query afterwards.