I have daily tables on BigQuery. How to query the "newest" one?

Question

I create daily tables. How to query the "newest" one?

For example GitHubArchive now publishes daily tables - instead of a monolithic one (see /r/bigquery/.../github_archive_changes_monthly_and_daily_tables/).

score 8 · Accepted Answer · edited May 23 '17 at 12:32

To find the latest table, you could use BigQuery's table query functions:

For example, the latest daily GitHubArchive table:

SELECT COUNT(*) num_records
FROM TABLE_QUERY(githubarchive:day,
  "table_id IN (
     SELECT table_id FROM githubarchive:day.__TABLES__ 
     ORDER BY creation_time DESC LIMIT 1)")

# 201859

For maximum convenience, you can store that query as a view to share with others. Then to query the latest table, just query the view:

SELECT num_records
FROM [fh-bigquery:public_dump.githubarchive_latest_day]

# 201859

For more on table query functions, see Jordan's answer to How do I use the TABLE_QUERY() function in BigQuery?.

score 3 · Answer 2 · answered Jul 05 '17 at 12:50

Felipe's answer uses Legacy SQL. Here's a solution in Standard SQL.

If your table names have a predictable suffix like GitHub's example in your link (which uses a YYYYMMDD suffix for each day), you can use wildcard tables and TABLE_SUFFIX:

SELECT COUNT(*)
FROM `githubarchive.day.events_*`
WHERE _TABLE_SUFFIX = (
  SELECT MAX(SUBSTR(table_id, -8))
  FROM `githubarchive.day.__TABLES_SUMMARY__`
  WHERE REGEXP_CONTAINS(table_id, "events_\\d{8}")
)

If you want to rely on BigQuery's metadata about creation time instead of the prefix, you can do:

SELECT COUNT(*)
FROM `githubarchive.day.*`
WHERE _TABLE_SUFFIX = (
  SELECT table_id
  FROM `githubarchive.day.__TABLES_SUMMARY__`
  ORDER BY creation_time DESC
  LIMIT 1
)

Note that if you have tables with different schemas in your project, the latter syntax is dangerous, since the query will use the schema of the first table it sees in the wildcard.

If you're trying to create tables that are partitioned by date, there's a BigQuery feature called date-partitioned tables that does just that, and makes it easier to query afterwards.

I have daily tables on BigQuery. How to query the "newest" one?

2 Answers2

Linked