How to delete duplicates from one table, but keeping only one record?

Question

Hello SQL query experts! I have one table called 'mytable' which has 2 columns such as id and title . I tried to remove duplicates except only one record(row) comparing title. Below was my choice:

DELETE FROM `myTable` AS `m1`
WHERE `m1`.`id` 
NOT IN (SELECT MIN(`b`.`id`) as `recordid` FROM `myTable` AS `b` GROUP BY `b`.`title`)

error : Error in query (1064): Syntax error near '* FROM `myTable` AS `m1` WHERE `m1`.`id` NOT IN (SELECT MIN(`b`.`id`) as `reco' at line 1 but I faced a trouble and tried to resolve this problem more than 2 hours. It seems like very simple problem. But I can't figure it out. So I am asking to stackoverflow!

And mainly, I see something strange. I tried like this but it has no any error.

SELECT * FROM `myTable` AS `m1`
WHERE `m1`.`id` 
NOT IN (SELECT MIN(`b`.`id`) as `recordid` FROM `myTable` AS `b` GROUP BY `b`.`title`)

When I run this query, I can obtain the list of records(rows) I want to delete from 'myTable' table.

Why do I face a deletion problem although I can obtain the list to delete?

I need your help really. Thanks everyone!

Could this help ? https://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group — Strauteka, Feb 18 '20 at 05:22
look this :-https://stackoverflow.com/questions/1286843/how-to-keep-only-one-row-of-a-table-removing-duplicate-rows — Malakiya sanjay, Feb 18 '20 at 05:23
Does this answer your question? [Select first row in each GROUP BY group?](https://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group) — Serg, Feb 18 '20 at 06:25
@Malakiyasanjay Thanks for your help. It seems like I find out the solution. But it will take huge times to run my new query. So I can't give correct answer if your suggestion is so fit to my question now. — Leon, Feb 18 '20 at 08:57

score 0 · Answer 1 · answered Feb 18 '20 at 13:03

You can phrase this as:

delete m
    from mytable m left join
         (select m2.title, min(m2.id) as min_id
          from mytable m2
          group by m2.title
         ) m2
         on m.title = m2.title and m.id > m.min_id;

For performance, you want in index on (title, id).

score 0 · Answer 2 · answered Feb 18 '20 at 13:16

I think Gordon's answer lays the gist. Recently had to do something similar, ended up with this (applied to your situation):

DELETE FROM mytable WHERE id IN (
    SELECT *
    FROM (
        SELECT m.id
        FROM my_table m
        WHERE m.id NOT IN (
            SELECT MAX(m.id)
            FROM my_table sub
            GROUP BY sub.title
            HAVING COUNT(sub.title) > 1
        )
        AND m.id NOT IN (
            SELECT MAX(sub2.id)
            FROM my_table sub2
            GROUP BY sub2.title
            HAVING COUNT(sub2.title) = 1
        )
  ) AS m
)

The extra wrapper was necessary (if I remember correctly) because sub-query was not allowed in a DELETE statement (but could be used like shown).

This will remove all the records, by ID, that have a count (of title) greater than 0, but will not remove the latest (max) record.

NOTE: this is a very intensive query. Indexes on ID & Title are recommended and even then: sloooowwww. Ran this through just 100k records with indexes and still takes about 10 seconds.

score 0 · Answer 3 · answered Feb 18 '20 at 13:27

The syntax:

DELETE FROM `myTable` AS `m1`

is wrong.
It should be:

DELETE m1 FROM `myTable` AS `m1`

but you don't need to alias the table, you can just do

DELETE FROM `myTable`

Also MySql does not allow the direct use of the target table inside a subquery like the one you use with NOT IN, but you can overcome this limitation by enclosing the subquery inside another one:

DELETE FROM `myTable` 
WHERE `id` NOT IN (
  SELECT `recordid`
  FROM (
    SELECT MIN(`id`) as `recordid` 
    FROM `myTable` 
    GROUP BY `title`
  ) t  
)

I removed the aliases of the nested subquery because they are not needed.

Leon · Accepted Answer · 2020-02-28T04:46:14.937

I found out the exact reason of issue I faced finally. I referenced the comment of @Malakiyasanjay. you can find that from here How to keep only one row of a table, removing duplicate rows?

I tried like this: (and it worked for me as well but it took a lot of time to run the query for 30,000 rows)

delete from myTable
where id not in 
(select min(id) as min from (select * from myTable) as x group by title)

The problem was I couldn't specify the 'myTable' table as a target table. so I used (select * from myTable) as x and figured it out.

I am sorry I can't explain more detail about that because I am not familiar with mysql query. But you should note that:

MySql does not allow the direct use of the target table inside a subquery like the one you use with NOT IN, but you can overcome this limitation by enclosing the subquery inside another one. (Please reference @forpas 's answer.)

But you have to notice this takes so long time... It might cause the time out error. I ran this query for table with about 600,000 rows but it didn't response for several days. So I conclude this idea is pretty fit to small database table.

I hope this is helpful for everyone! :)

How to delete duplicates from one table, but keeping only one record?

4 Answers4