I have a database that contains live production data, and in this database there is a table that stores relationship links (a.k.a column1 = table_a_id
, column2 = table_b_id
). It has come to my attention that when this table was created, a primary key or unique constraint across the two relationship IDs was never created, thus allowing for duplicate links to exist in the table. And as it turns out, as I'm trying to ALTER TABLE x ADD PRIMARY KEY(table_a_id, table_b_id)
, the alter fails because there are in fact a handful of duplicates.
I need to clear out the duplicates before I can institute the primary key, but I can't just use DELETE
because any execution of DELETE
that targets the duplicate will also target the row I want to keep.
Is there any way to delete a row that is an exact duplicate of another row across all columns without deleting the other row?
For example:
+------------+------------+-------------+
| table_a_id | table_b_id | other_data |
+------------+------------+-------------+
| 1 | 1 | <some data> |
| 2 | 2 | <some data> |
| 3 | 2 | <some data> |
| 1 | 3 | <some data> |
| 3 | 2 | <some data> | // Duplicate row
+------------+------------+-------------+
In this example, how do I delete row 5 without affecting row 3?