I have BigQuery table with columns col1, col2, col3. I want to delete rows that have duplicated values in col1 but keep the last (The row that was last pushed to BigQuery). col2 and col3 does not have to be duplicates.
My main problem is that I cannot find the last column. I tried below query, ordering was not right. But when I just SELECT all the rows the ordering is from the oldest to the newest rows.
SELECT *, ROW_NUMBER() OVER (PARTITION BY col1) AS row_numb
FROM table
I saw other solutions but they ordered by some column timestamp/created_at that I do not have. I know that one solution would be to add column with timestamp and then order by it to get most recent row. But is there any other way?
Example:
col1 | col2 | col3 |
---|---|---|
1 | 2 | 3 |
2 | 3 | 1 |
1 | 4 | 3 |
The last row (col1 = 1, col2 = 4, col3 = 3) was added last to BigQuery So what I want is to find duplicates in first column (That would be 1. and 3. row) and delete all the duplicates except the one that was added last to BigQuery (That is the 3. row).
The result would be
col1 | col2 | col3 |
---|---|---|
2 | 3 | 1 |
1 | 4 | 3 |