5

porting some stuff to bigquery, and come across an issue. We have a bunch of data with no unique key value. Unfortuantely some report logic requires a unique value for each row.

So in systems like Oracle I would just user the ROWNUM or ROWID psudeo columns.

In vertica, which doesn't have those psudeo columns I would use ROW_NUMBER() OVER(). But in bigquery that is failing with the error:

'dataset:bqjob_r79e7b4147102bdd7_0000016482b3957c_1': Resources exceeded during query execution: The query could not be executed in the allotted memory.
OVER() operator used too much memory..

The value does not have to be persistent, just a unique value within the query results.

Would like to avoid extract-process-reload if possible.

So is there any way to assign a unqiue value to query result rows in bigquery SQL?

Edit: Sorry, should have clarified. Using standard sql, not legacy

Sodved
  • 8,428
  • 2
  • 31
  • 43

1 Answers1

7

For ROW_NUMBER() OVER() to scale, you'll need to use PARTITION.

See https://stackoverflow.com/a/16534965/132438

#standardSQL
SELECT *
  , FORMAT('%i-%i-%i', year, month, ROW_NUMBER() OVER(PARTITION BY year, month)) id
FROM `publicdata.samples.natality` 

enter image description here

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325