delete duplicate rows postgresql

Question

I just want to delete duplicated rows with the same name but I want to keep these ones with higher value than other.

The problem is - when I am executing code all the duplicate rows are deleted and no one is kept. Why so?


SELECT name from ( select name, ROW_NUMBER() OVER(
PARTITION_BY(name) ORDER BY sum DESC) AS rn FROM table ) t
WHERE t.rn > 1

gives me 1400 rows but

DELETE FROM table WHERE name IN 
(select company_name from ( select company_name, row_number() over (
partition by(company_name) order by sum_scoring desc) as rn from table ) t
where t.rn > 1)

deletes 2500 rows (I want to keep 1100 rows)

Your inner select returns nothing that would help separate "the latest" from "not the latest" duplicate, you just return a name of duplicated companies. Instead you should select a primary key and delete by it. — zerkms, Jun 21 '19 at 00:25

score 1 · Answer 1 · answered Jun 21 '19 at 07:30

As zerkms mentioned, you aren't differentiating the records enough.
You're just deleting all the records with names that are duplicated, you want to delete all but one.
This requires a primary key, or another unique column.
I'd recommend using rowid (ctid in postgres AFAIK)

DELETE FROM table 
WHERE ctid IN 
(
  select row_id_int
  from 
  ( 
    select company_name ctid as row_id_int, row_number() over ( partition by(company_name) order by 
      sum_scoring desc) as rn 
    from table
  ) t
  where t.rn > 1
)

Resources:
ROWID equivalent in POSTGRES
POSTGRES Documentation, system columns

score 0 · Answer 2 · answered Dec 14 '21 at 19:31

@johnathanjacobs answer is very helpful. Luckily (really) I have a id serial pk, so simplified the process a bit:

delete from table
where id in 
(
    select id
    from 
    (
        select id, smashup, row_number() over w1 as rn
        from table
        window w1 as (partition by smashup) 
    ) t
    where t.rn > 1
);

delete duplicate rows postgresql

2 Answers2

Linked