ROWNUM or ROWID in google bigquery

Question

porting some stuff to bigquery, and come across an issue. We have a bunch of data with no unique key value. Unfortuantely some report logic requires a unique value for each row.

So in systems like Oracle I would just user the ROWNUM or ROWID psudeo columns.

In vertica, which doesn't have those psudeo columns I would use ROW_NUMBER() OVER(). But in bigquery that is failing with the error:

'dataset:bqjob_r79e7b4147102bdd7_0000016482b3957c_1': Resources exceeded during query execution: The query could not be executed in the allotted memory.
OVER() operator used too much memory..

The value does not have to be persistent, just a unique value within the query results.

Would like to avoid extract-process-reload if possible.

So is there any way to assign a unqiue value to query result rows in bigquery SQL?

Edit: Sorry, should have clarified. Using standard sql, not legacy

score 7 · Accepted Answer · answered Jul 10 '18 at 06:01

7

For ROW_NUMBER() OVER() to scale, you'll need to use PARTITION.

See https://stackoverflow.com/a/16534965/132438

#standardSQL
SELECT *
  , FORMAT('%i-%i-%i', year, month, ROW_NUMBER() OVER(PARTITION BY year, month)) id
FROM `publicdata.samples.natality`

answered Jul 10 '18 at 06:01

Felipe Hoffa

54,922
16
151
325

1

Perfect thanks. Should have though of that myself. Very much appreciated. – Sodved Jul 10 '18 at 06:05

ROWNUM or ROWID in google bigquery

1 Answers1