0

I have a table in BigQuery that has >1M rows. I got a duplicate with the same ID and the same data completely. Need to delete one of the two duplicates without recreating the table or changing anything on that table. Please help, thanks.

I did try to just delete the id but it will delete the both rows. Need to try with row_number but not sure how.

  • *'without [...] changing the table'* – not temporarily either? If you add and remove in a single transaction no one should notice. – Aconcagua Jul 26 '23 at 06:26
  • @Aconcagua creating a temporary table? – Insecupa Jul 26 '23 at 06:58
  • Initially thought about adding a temporary column, filling with unique ids (with some BigQuery equivalent to [here](https://stackoverflow.com/questions/9414826/sql-server-how-to-add-new-identity-column-and-populate-column-with-ids)) and delete those with (new) id not in the maxima of ids when grouping all the other columns. Though `select`ing `distinct` in a temporary table, dropping the old one and renaming the new one back might actually be faster, at very least much simpler to implement ;) – Aconcagua Jul 26 '23 at 07:19
  • And once you've solved define a primary key or at least some unique constraint such that duplicates cannot re-enter any more. – Aconcagua Jul 26 '23 at 07:21

2 Answers2

0

Consider below option (BigQuery Standard SQL)

CREATE OR REPLACE TABLE your_table AS 
SELECT *
FROM your_table t
QUALIFY ROW_NUMBER() OVER(PARTITION BY FORMAT('%t', t)) = 1;
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
0

Try this (tested for SQL Server. I don't know, is this work with BigQuery. Not works with Postgres. Only as example.)

--test data
create table test (id int,dt date,descr varchar(10));
insert into test values
 (1,'2023-07-26','test1-1')
,(1,'2023-07-26','test1-2')
,(1,'2023-07-26','test1-3')
,(2,'2023-07-26','test2-1')
,(3,'2023-07-26','test3-1')
;
-- query
with rng as (
     select * 
        ,row_number()over(partition by id,dt order by (select 1)) rn
     from test
)
delete from rng
where rn>1;

result

id dt descr
1 2023-07-26 test1-1
2 2023-07-26 test2-1
3 2023-07-26 test3-1
ValNik
  • 1,075
  • 1
  • 2
  • 6