How to remove duplicate rows in postgresql?

Question

I would like to remove duplicate entries in Postgresql.

There is no unique constraint, but I would like to consider all columns together to consider a row as a duplicate.

So we have a table containing following rows :

id   |   name   | age           | started_date  |Score |
-----|----------|---------------|---------------|------|
1    | tom      | 15            | 01/06/2022    |5     |
2    | tom      | 15            | 01/06/2022    |5     |
3    | henry    | 10            | 01/06/2022    |4     |
4    | john     | 11            | 01/06/2022    |6     |
...

I would like to consider all columns together to identify the duplicate rows.

How to achieve this in Postgresql ?

What have you tried? What's your schema? What's the sample data? — David Brossard, Jun 10 '18 at 06:28
What do you mean by "remove"? In a query or in the data itself? — Gordon Linoff, Jun 10 '18 at 11:33
related question https://stackoverflow.com/questions/56695357/delete-duplicate-rows-postgresql — imel96, Apr 07 '22 at 00:44

score 7 · Answer 1 · answered Jun 10 '18 at 06:51

PostgreSQL assigns a ctid pseudo-column to identify the physical location of each row. You could use that to identify different rows with the same values:

-- Create the table
CREATE TABLE my_table (num1 NUMERIC, num2 NUMERIC);

-- Create duplicate data
INSERT INTO my_table VALUES (1, 2);
INSERT INTO my_table VALUES (1, 2);

-- Remove duplicates
DELETE FROM my_table
WHERE ctid IN (SELECT ctid
               FROM   (SELECT ctid,
                              ROW_NUMBER() OVER (
                                PARTITION BY num1, num2) AS rn
                       FROM   my_table) t
               WHERE  rn > 1);

DB Fiddle

score 0 · Answer 2 · answered Jun 10 '18 at 06:33

0

Let say your table has 2 columns, you can identify duplicates using. Post this :-

1) Insert this result into a temp table

2) Drop data from Main table

3) Insert data from temp table into main table

4) Drop temp table.

select col1, col2, count(*) as cnt
from table1
group by col1, col2
having  cnt > 1

answered Jun 10 '18 at 06:33

Gaurav

1,070
9
11

Is there a way to delete found duplicate data directly from the table? – yome Jun 10 '18 at 06:37
SQL mentioned in answer will do the same. – Gaurav Jun 11 '18 at 13:09

How to remove duplicate rows in postgresql?

2 Answers2