-1

I would like to remove duplicate entries in Postgresql.

There is no unique constraint, but I would like to consider all columns together to consider a row as a duplicate.

So we have a table containing following rows :

id   |   name   | age           | started_date  |Score |
-----|----------|---------------|---------------|------|
1    | tom      | 15            | 01/06/2022    |5     |
2    | tom      | 15            | 01/06/2022    |5     |
3    | henry    | 10            | 01/06/2022    |4     |
4    | john     | 11            | 01/06/2022    |6     |
...

I would like to consider all columns together to identify the duplicate rows.

How to achieve this in Postgresql ?

schlebe
  • 3,387
  • 5
  • 37
  • 50
yome
  • 2,853
  • 5
  • 20
  • 18

2 Answers2

7

PostgreSQL assigns a ctid pseudo-column to identify the physical location of each row. You could use that to identify different rows with the same values:

-- Create the table
CREATE TABLE my_table (num1 NUMERIC, num2 NUMERIC);

-- Create duplicate data
INSERT INTO my_table VALUES (1, 2);
INSERT INTO my_table VALUES (1, 2);

-- Remove duplicates
DELETE FROM my_table
WHERE ctid IN (SELECT ctid
               FROM   (SELECT ctid,
                              ROW_NUMBER() OVER (
                                PARTITION BY num1, num2) AS rn
                       FROM   my_table) t
               WHERE  rn > 1);

DB Fiddle

Mureinik
  • 297,002
  • 52
  • 306
  • 350
0

Let say your table has 2 columns, you can identify duplicates using. Post this :-

1) Insert this result into a temp table

2) Drop data from Main table

3) Insert data from temp table into main table

4) Drop temp table.

select col1, col2, count(*) as cnt
from table1
group by col1, col2
having  cnt > 1 
Gaurav
  • 1,070
  • 9
  • 11