20

Is there something like an AUTO_INCREMENT, SERIAL, IDENTITY or sequence in BigQuery ?

I'm aware of ROW_NUMBER https://cloud.google.com/bigquery/query-reference#row-number
But I want to persist a generated unique ID for every row in my table.

JoseKilo
  • 2,343
  • 1
  • 16
  • 28
  • ROW_NUMBER would work, if you ran a query to compute a new "id" column for each row (and saved the result as your new table). That said, I'm curious *why* you want to do this -- BigQuery isn't really intended for single-row lookups by key (you have to scan the entire table) and I'd imagine some other combination of columns would make a more meaningful "identity" for the row. Is this just for consumption in another system downstream? – Adam Lydick Sep 08 '16 at 17:46
  • I want to be able to integrate data from several sources into the same table(s) in BQ. Those sources of data already have IDs, but they may clash with each other, so I want to have an unified ID for later access. Regarding access by ID, I was planning to keep a cached version of the map between old and new IDs in Datastore, so that most of the time I wouldn't need to do single-row lookups in BQ. – JoseKilo Sep 09 '16 at 09:04
  • Makes sense. I've added a few more suggestions. If this is just the union of your data sources, a namespace-prefixed key might be the easiest solution to ensure uniqueness (and will be more stable than ROW_NUMBER if you have to regenerate it later). – Adam Lydick Sep 09 '16 at 16:20
  • Sounds interesting. I will give it a try. Thanks :) – JoseKilo Sep 12 '16 at 08:26

1 Answers1

16

BigQuery does not have a notion of row key generation at load time. You could rewrite the table with a query to generate arbitrary keys for your rows.

As you noted, ROW_NUMBER would give you a unique index for each row, but you may hit size limits for particularly large tables (since you'd need an unpartitioned window function over everything).

If you can tolerate a larger string key, you might consider generating a UUID for each row (which can be done randomly and doesn't require coordination with the rest of your data). If you're using Standard SQL (and you should!) the GENERATE_UUID() function will accomplish this.

In the linked answer, Felipe constructs a composite key, which may also work for you, if the combination of your keys is distinct.

Adam Lydick
  • 1,092
  • 7
  • 15