16

Ok so there are couple posts here already on this and fewer still out on the web. I've literally tried every one of them and can not get anything to work. Hopefully someone here can take pity on me :)

Here is the data I'm working with. I want to delete all these records.

SELECT
part_desc, count(*) as rec_num
FROM ag_master
GROUP BY part_desc HAVING COUNT(*) > 1000;

+--------------------------------------+---------+
| part_desc                            | rec_num |
+--------------------------------------+---------+
| SILICON DELAY LINE, TRUE OUTPUT      |    1092 |
| LOADABLE PLD                         |    1401 |
| 8-BIT, FLASH, 8 MHz, MICROCONTROLLER |    1411 |
| FPGA                                 |    1997 |
| 8-BIT, MROM, 8 MHz, MICROCONTROLLER  |    3425 |
+--------------------------------------+---------+
5 rows in set (0.00 sec)

The closest I've come to finding code that would do it is shown below. The syntax checks ok and it runs, however it just seems to hang the database up. I've let it run for as long as 10 minutes and nothing ever happens so I abort it.

DELETE
FROM ag_master
WHERE part_id IN (
  SELECT part_id
  FROM ag_master
  GROUP BY part_desc
  HAVING COUNT(*) > 1000
);

Here's the explain plan on the tmp table

mysql> EXPLAIN SELECT * FROM ag_master WHERE part_desc IN (SELECT part_desc FROM tmp);
+----+--------------------+-----------+--------+---------------+------+---------+------+--------+-------------+
| id | select_type        | table     | type   | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+--------------------+-----------+--------+---------------+------+---------+------+--------+-------------+
|  1 | PRIMARY            | ag_master | ALL    | NULL          | NULL | NULL    | NULL | 177266 | Using where |
|  2 | DEPENDENT SUBQUERY | tmp       | system | NULL          | NULL | NULL    | NULL |      1 |             |
+----+--------------------+-----------+--------+---------------+------+---------+------+--------+-------------+
2 rows in set (0.00 sec)
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
RFQ Master
  • 205
  • 1
  • 3
  • 10
  • Why are the two select queries different, your first one differs from your second one? Also I take it you want to delete 1092+1401+1411+1997+3425 = 9,326 records? – Rippo May 11 '12 at 15:51
  • Not sure what you mean? One is just showing you the data I want to get rid of, the other is what the recommended way to format a delete statement while using the having count. – RFQ Master May 11 '12 at 15:53
  • Yes, I want to get rid of all 9k + records. – RFQ Master May 11 '12 at 15:55
  • The first query `DOES NOT` return the part_id, add that to the first query and then you should what I mean! – Rippo May 11 '12 at 15:56
  • It's the whole row I'm trying to get rid of Rippo – RFQ Master May 11 '12 at 16:03
  • `SELECT part_id ... GROUP BY part_desc` won't retrieve 9000+ records as you expect but only 5, a random one (most probably the first one in the data order on file) from each of the groups displayed by the first query. -- http://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html – axiac Aug 21 '15 at 23:03
  • See also https://stackoverflow.com/q/11705996/1066234 – Avatar Sep 05 '20 at 20:16

2 Answers2

23

As stated in the manual:

Currently, you cannot delete from a table and select from the same table in a subquery.

I think you'll have to perform this operation via a temporary table:

CREATE TEMPORARY TABLE temp
  SELECT   part_desc
  FROM     ag_master
  GROUP BY part_desc
  HAVING   COUNT(*) > 1000;

DELETE FROM ag_master WHERE part_desc IN (SELECT part_desc FROM temp);

DROP TEMPORARY TABLE temp;
eggyal
  • 122,705
  • 18
  • 212
  • 237
  • Thanks eggyal, I actually tried that but it only creates 5 records in the temporary table, which corresponds to the number of rows that it's grouped by but not the number of records? – RFQ Master May 11 '12 at 16:02
  • @RFQMaster: Yes, but then you delete all records from `ag_master` which match on any of those 5 descriptions (which is what you are after, no?). – eggyal May 11 '12 at 16:04
  • Ok, that doesn't seem to work either. It ran for about 12 minutes and then errored out with a '2013. Lost connect to MySQL server during query' – RFQ Master May 11 '12 at 16:19
  • Do you have an index on `part_desc`? Can you post in an edit to your question the output of `EXPLAIN SELECT * FROM ag_master WHERE part_desc IN (SELECT part_desc FROM temp)`? – eggyal May 11 '12 at 16:20
  • Well, that output makes it clear that you're having to do a full table scan for matching records; if you're often going to perform queries of this sort, I'd suggest building an index on `part_desc`: `ALTER TABLE ag_master ADD INDEX (part_desc)`, then the `DELETE` operation should run much faster. – eggyal May 11 '12 at 16:34
  • Yea It's not something I plan on doing often, just trying to get it down to a workable amount of records and then I'll be done with it. Going to try the index now. – RFQ Master May 11 '12 at 16:42
9

Another option is using an inner join to filter the result:

DELETE
    ag_master.*
FROM
    ag_master 

    INNER JOIN 
    (
        SELECT 
            part_id
        FROM 
            ag_master
        GROUP BY 
            part_desc
        HAVING COUNT(*) > 1000
    )AS todelete ON
            todelete.part_id = ag_master.part_id
  • SQL State : 42S22 Error Code : 1054 Message : Unknown column 'x.id' in 'on clause'. `delete d from domains d join (select max(id) from domains group by name, month having count(*) > 1) x on d.id = x.id` – Chloe Apr 11 '19 at 02:28