0

I have a database that contains live production data, and in this database there is a table that stores relationship links (a.k.a column1 = table_a_id, column2 = table_b_id). It has come to my attention that when this table was created, a primary key or unique constraint across the two relationship IDs was never created, thus allowing for duplicate links to exist in the table. And as it turns out, as I'm trying to ALTER TABLE x ADD PRIMARY KEY(table_a_id, table_b_id), the alter fails because there are in fact a handful of duplicates.

I need to clear out the duplicates before I can institute the primary key, but I can't just use DELETE because any execution of DELETE that targets the duplicate will also target the row I want to keep.

Is there any way to delete a row that is an exact duplicate of another row across all columns without deleting the other row?

For example:

+------------+------------+-------------+
| table_a_id | table_b_id | other_data  |
+------------+------------+-------------+
|          1 |          1 | <some data> |
|          2 |          2 | <some data> |
|          3 |          2 | <some data> |
|          1 |          3 | <some data> |
|          3 |          2 | <some data> | // Duplicate row
+------------+------------+-------------+

In this example, how do I delete row 5 without affecting row 3?

Abion47
  • 22,211
  • 4
  • 65
  • 88
  • @LaurenzAlbe The linked question deals with removing duplicates based on a single unique ID column. My question has to do with a table that has two ID columns whose IDs aren't unique themselves (they should've been set up to be unique as a pair but they weren't) and are in fact foreign keys, so the answer in the link doesn't help me. – Abion47 Aug 28 '19 at 18:37
  • Then use the accepted answer from this question. This question comes up in some form almost every day. Why are so many people allergic to primary keys? – Laurenz Albe Aug 28 '19 at 18:41
  • @LaurenzAlbe I didn't set up the database, and in fact I'm trying to set up the primary key on the table now. The problem is I can't until I've dealt with the duplicates that currently already exist in the table. – Abion47 Aug 28 '19 at 18:44

1 Answers1

1

You can use the ctid system column to differentiate the rows:

DELETE FROM your_table t1
USING your_table t2
WHERE t1 = t2
AND t1.ctid > t2.ctid;
Jeremy
  • 6,313
  • 17
  • 20
  • Where does the second table come into play? – Abion47 Aug 28 '19 at 18:26
  • It's the same table; you do a self-join on the table to find rows that match, but have different ctids. – Jeremy Aug 28 '19 at 18:33
  • Is there a way to use `SELECT` using this syntax to preview what rows it would affect? (I'm being slightly paranoid as this is a table with production data.) – Abion47 Aug 28 '19 at 18:38
  • Yes, `select t1.* from your_table t1 join your_table t2 on t1 = t2 and t1.ctid < t2.ctid;` – Jeremy Aug 28 '19 at 20:23