How can I remove rows that are 100% duplicates in a PostgreSQL table without a primary key?

Question

I have a PostgreSQL table with a very large number of columns. The table does not have a primary key and now contains several rows that are 100% duplicates of another row.

How can I remove those duplicates without deleting the original along with them?

I found this answer on a related question, but I'd have to spell out each and every column name, which is error-prone. How can I avoid having to know anything about the table structure?

Example:

Given

create table duplicated (
 id int,
 name text,
 description text
);

insert into duplicated
values (1, 'A', null), 
       (2, 'B', null),
       (2, 'B', null),
       (3, 'C', null), 
       (3, 'C', null), 
       (3, 'C', 'not a DUPE!');

after deletion, the following rows should remain:

(1, 'A', null) 
(2, 'B', null)
(3, 'C', null) 
(3, 'C', 'not a DUPE!')

@a_horse_with_no_name: edited my question to explain the difference to the question you proposed as duplicate: one aspect I couldn't find there was how to avoid knowing the table structure. This rendered its answer inapplicable in my scenario. — blubb, Oct 28 '19 at 16:00

blubb · Answer 1 · 2019-10-28T15:57:26.267

As proposed in this answer, use the system column ctid to distinguish the physical copies of otherwise indentical rows.

To avoid having to spell out a non-existing 'key' for the rows, simply use the row constructor row(table), which returns a row value containing the entire row as returned by select * from table:

DELETE FROM duplicated
USING (
      SELECT MIN(ctid) as ctid, row(duplicated) as row
        FROM duplicated 
        GROUP BY row(duplicated) HAVING COUNT(*) > 1
      ) uniqued
      WHERE row(duplicated) = uniqued.row
      AND duplicated.ctid <> uniqued.ctid;

You can try it in this DbFiddle.

How can I remove rows that are 100% duplicates in a PostgreSQL table without a primary key?

1 Answers1