1

I need to delete duplicates from a table while keeping one item. Since I cannot delete from a table while accessing this very same table in a subquery inside the where-statement, i decided to store my affected IDs in a temporary table:

create temporary my_temp_table (
  id int not null
) engine memory;

and then insert IDs using a select:

insert into my_temp_table
  -- select query works
  select
    id
  from
    -- innodb
    table_with_duplicates
  where
    hash_code in (
    select
      hash_code
    from
      table_with_duplicates
    group by
      hash_code
    having
      count(id) > 1
    )
    and date_created < '2015-01-01'
;

Later I want to use these IDs to delete them:

delete from table_with_duplicates
  where id in (
    select id from my_temp_table
  )
;

Just executing the select part of the insert statement works fine. Adding the insert part, however, causes 1 CPU core to go up to 100% and the query seems to never end. Nothing is inserted. On my dev environment, table_with_duplicates contains about 20000 rows with 1 duplicate. Any ideas?

EDIT: Thank you for your answers. I have tried a select distinct(... approach, which didn't help much. Maybe I've used in the wrong place/subselect. Played around with it quite a lot already. To clarify, I have something like this:

ID    date_created    hash_code
1     2013-06-06      ABCDEFGH     <-- delete this one
2     2013-08-08      HGFEDCBA
3     2015-11-11      ABCDEFGH     <-- keep this one
  • Take a look at the answer by @ajreal for your delete with subquery problem, and see if it helps: http://stackoverflow.com/questions/4471277/mysql-delete-from-with-subquery-as-condition – shree.pat18 Nov 26 '15 at 10:18
  • Why did not you use select distinct and assigned them to your temporary table? – Burak Karasoy Nov 26 '15 at 10:31
  • @ajreal solved a similar problem (subselect in insert), but, the way I understand it, a different use case. btw, it was this posting where I learned, that you can't access the same table in an insert/where/select combination. – Beat Luginbühl Nov 26 '15 at 12:47
  • Take a look here: http://stackoverflow.com/a/25492238/394487 – siride Nov 26 '15 at 15:13
  • 1
    well, @siride, you made my day :-) this is a rather easy way to remove duplicate items, which could have saved me a lot of time. Thanks for that. This is a potential answer post. Still am puzzled of the effect on system load my first approach had. – Beat Luginbühl Nov 29 '15 at 14:20

1 Answers1

0

Well, I chose different approach to solve this problem. I've written a small PHP command-line script that fetches all my IDs into an array and then executes a DELETE, using all IDs as ....

"delete from table_with_duplicates where id in (".explode(',' $arrIDs).")";

Having about 9000 affected rows and this (hopefully) being a one-time action, this worked for me.

I also played around with

SET SESSION optimizer_search_depth = 1;

but had no luck either.