I have a table with >70M rows of data and 2M of duplicates. I want to clean duplicates by keeping the recent original row.
I found a few solutions from here - link
In which, solutions are only to clean the duplicates and not retain the recent data among the duplicates.
here is another common solution:
;WITH cte
AS (SELECT Row_number() OVER (partition BY id ORDER BY
updatedAt
DESC,
status DESC) RN
FROM MainTable)
DELETE FROM cte
WHERE RN > 1
But it is not supported in BigQuery.