Keep only first row given a set of columns in PostgreSQL

Question

I have a table that looks like this:

id	creation_date	col1	col2	col3
1	21 April	a	b	e
2	23 April	a	b	e
3	21 April	c	d	e

My goal is to delete all the duplicates, keeping only the first (ie first created) row.

The duplicates are defined using a set of columns from the table. For example, using col1 and col2, the result will be :

id	creation_date	col1	col2	col3
1	21 April	a	b	e
3	21 April	c	d	e

One answer that I found use one column as a key :

DELETE FROM dups a USING (
      SELECT MIN(ctid) as ctid, key
        FROM dups 
        GROUP BY key HAVING COUNT(*) > 1
      ) b
      WHERE a.key = b.key 
      AND a.ctid <> b.ctid

, but I can't see how to adapt to the case where multiple columns are used, and where the earliest record should be kept.

score 1 · Answer 1 · answered Apr 16 '22 at 15:03

1

You can try to use DELETE and EXISTS with your logic in subquery condition.

DELETE 
FROM dups t
WHERE EXISTS (SELECT 1 
               FROM dups tt
               WHERE tt.col1 =  t.col1
               AND tt.col2 =  t.col2
               AND tt.col3 =  t.col3
               HAVING MIN(tt.ctid) <> t.ctid
               );

sqlfiddle

answered Apr 16 '22 at 15:03

D-Shih

44,943
6
31
51

You can replace the HAVING clause with `AND tt.ctid < t.ctid` – Lennart - Slava Ukraini Apr 16 '22 at 15:11

score 1 · Answer 2 · answered Apr 16 '22 at 15:08

1

You can use an implicit self join of the table:

DELETE FROM dups d1
USING dups d2
WHERE (d2.col1, d2.col2, d2.col3) = (d1.col1, d1.col2, d1.col3)
  AND d2.creation_date < d1.creation_date;

See the demo.

answered Apr 16 '22 at 15:08

forpas

160,666
10
38
76

Keep only first row given a set of columns in PostgreSQL

2 Answers2