I am using postgresql. To delete duplicates in a table I use this query:
DELETE FROM dups a USING (
SELECT MIN(ctid) as ctid, key
FROM dups
GROUP BY key HAVING COUNT(*) > 1
) b
WHERE a.key = b.key
AND a.ctid <> b.ctid
Reference: https://stackoverflow.com/a/12963112/4940278
However, there is a table say ref_table
where the dups.id
are referenced as well. I need to update the other table before deleting the duplicates.
What is the query to update the reference table with the duplicate's id, so that there is no data loss?
For example:
Table 1, say dups
id key
1 Luna
2 Hermione
3 Luna
Table 2, say ref_table
id dups_id data
1 2 Auror
2 1 Researcher
Now the query to delete the duplicates will delete the record with id 1 in dups table as it is a duplicate.
However that record is referenced in the ref_table
, so I need to update that with the record that is going to be retained.
i.e) Table1 should become:
id key
2 Hermione
3 Luna
and Table 2 should become:
id dups_id data
1 2 Auror
2 3 Researcher