Is there any way to delete duplicates?

Question

I have a table called test_db and I want to delete all but one from the duplicates of the names column, this is my database:

ID | Names
---+------
1  | Phil
2  | John
3  | John
4  | James
5  | Phil
6  | Robert

And I want to keep this:

ID | Names
---+------
1  | Phil
2  | John
4  | James
6  | Robert

There are 1,609 questions on SO found with the query '`[sql] delete duplicate is:q`'. Sorted by votes, the top few entries cover [MySQL](https://stackoverflow.com/questions/4685173/), [MS SQL Server](https://stackoverflow.com/questions/18390574/), [PostgreSQL](https://stackoverflow.com/questions/1746213/), [SQLite](https://stackoverflow.com/questions/8190541/). I think you could have done some research (or show why these are not relevant). You could also have paid attention to the popups that ask you to identify the DBMS you are using, including the appropriate tag in the question. — Jonathan Leffler, Jun 17 '18 at 21:47

Gordon Linoff · Answer 1 · 2018-06-17T23:32:47.203

0

A canonical method in SQL is:

delete from test_db
    where id <> (select min(t2.id) from test_db t2 where t2.name = test_db.name);

In MySQL, you would do:

delete t
    from test_db t join
         (select t2.name, min(t2.id) as min_id
          from test_db t2
          group by t2.name
         ) t2
         on t2.name = t.name
    where t.id > t2.min_id;

edited Jun 17 '18 at 23:32

answered Jun 17 '18 at 20:00

Gordon Linoff

1,242,037
58
646
786

This query gives me this error message: `Table 'test_db' is specified twice, both as a target for 'DELETE' and as a separate source for data` – abc Jun 17 '18 at 21:10
1

Which shouldn't be a problem. Sounds like you are using MySQL which is notorious for having such issues. Gorson asked you an hour ago what DBMS you are using and you haven't answered. You cannot expect good answers, if you are not ready to give us the necessary information. For MySQL try to replace `from test_db t2 where t2.name = test_db.name` with `from (select * from test_db) t2 where t2.name = test_db.name`. – Thorsten Kettner Jun 17 '18 at 21:45
@abc . . . Hence why you should tag the question with the database you are using. – Gordon Linoff Jun 17 '18 at 23:33
sorry... didn't notice you asked for it until now :( i have added the tags – abc Jun 18 '18 at 13:45

score 0 · Answer 2 · answered Jun 17 '18 at 21:38

0

In Sql Server you can write the query as:

delete T
from test_db T
join (
select Id,
       ROW_NUMBER() over (partition by [Names] order by Id Asc) as rownum
from test_db ) T1
on T.Id = T1.Id
where T1.rownum >1

answered Jun 17 '18 at 21:38

Deepshikha

9,896
2
21
21

Man · Answer 3 · 2018-06-18T03:45:57.497

delete from test_db where test_db .name 
        in(select * from( select id from test_db where name 
                           in(  select name from test_db
                                 group by name having count(name) > 1  )
                           and id not in (  select min(id) from test_db
                                group by name having count(name) > 1  ))as tableA);

first sub-query of tableA return find all duplicate name.
second sub-query of tableA return choose one which do not want to delete. you may use max or min.
using in get all duplicate row And using not in subtract from all duplicate row to which do not want to delete in tableA

another way

delete from docs where id in(select t1.id from(select id  from docs where name 
  in(  select name from docs 
   group by name having count(name) > 1  )) as t1
  left join (  select id from docs 
   group by name having count(name) > 1  ) as t2
on t1.id=t2.id where t2.id is null);

Is there any way to delete duplicates?

3 Answers3