2

I wanted to delete lots of rows from medium size (700K) table, based on primary key. Thought, the best way should use SELECT-subquery for DELETE source list. And found specific answer here too. Problem is: it is so much slower than using two separate queries (first select IDs and then delete those IDs from table). Why is that so?

I made simple test case too:

CREATE TABLE `xyz` (
  `xyzID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `col1` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`xyzID`)
) ENGINE=InnoDB;

Populated it with million records, and then:

DELETE FROM xyz
WHERE xyzID IN
        (
        SELECT xyzID
        FROM
            (
                SELECT xyzID
                FROM xyz
                LIMIT 3000,1000
            ) a
        );
Query OK, 1000 rows affected (53.52 sec)

Deleting 2000 rows doubles time:

Query OK, 2000 rows affected (1 min 48.25 sec)

But deleting without subquery (made select first) took almost no time (id-list generated by random, here):

DELETE FROM test.xyz WHERE xyzID IN ( 660422,232794,573802,....
Query OK, 996 rows affected (0.04 sec)

Why is deleting with subquery so slow?

Community
  • 1
  • 1
w.k
  • 8,218
  • 4
  • 32
  • 55
  • Limit without order by is meaning less, you may need to do a order by before applying limit. The best is to do order by primary key so that the indexes are resolved in the order by and hence by making the first query faster. – Abhik Chakraborty Nov 20 '15 at 09:35
  • @AbhikChakraborty this does not affect the deleting speed, in my real case I use much more complicated `SELECT` with `ORDER` and `LIMIT`. Problem stays. Same select from example above takes separately 0.01 s – w.k Nov 20 '15 at 09:38
  • See this answer here: http://dba.stackexchange.com/questions/1371/problem-with-mysql-subquery/1384#1384 – dognose Nov 20 '15 at 09:45
  • You are comparing apples with pears. Time the subquery that returns the list of ids and the delete with the list of ids together and compare that time to the delete with subquery. – Shadow Nov 20 '15 at 09:51
  • @Shadow So I did, if I run those queries separately it takes about 0.05s (0.01+0.04), but as subquery it takes 53s – w.k Nov 20 '15 at 09:55
  • IN the first query for each id to be deleted you ran that subquery which also contains a LIMIT- slow.In short the 2 methods are not equivalent.It`s easy to see if you get the EXPLAIN FOR both. – Mihai Nov 20 '15 at 09:57
  • While this doesn't answer your timing question, can you not avoid the sub query entirely by using an ORDER and LIMIT (with the complex WHERE clauses you need to use) on the DELETE itself? – Kickstart Nov 20 '15 at 10:45
  • @Kickstart yes, but DELETE does not support LIMIT offset – w.k Nov 20 '15 at 11:55
  • It does on single tables ( https://dev.mysql.com/doc/refman/5.6/en/delete.html ) . Is your delete over multiple tables? – Kickstart Nov 20 '15 at 12:06
  • @Kickstart hmm, dosc says only `row_count`, not `offset, rowcount` – w.k Nov 20 '15 at 12:12
  • True, but whether that is an issue will come with the in depth details of what you are doing, – Kickstart Nov 20 '15 at 13:08

3 Answers3

5

If you read the documentation on subqueries, you will find some things that might be the cause for this: https://dev.mysql.com/doc/refman/5.7/en/subquery-restrictions.html

The optimizer will rewrite your uncorrelated WHERE IN (Subquery) statements to correlated statements using exists.

So, your query might actually be executed like this:

DELETE FROM xyz t1
WHERE EXISTS (
    (
    SELECT 1
    FROM
        (
            SELECT xyzID t3
            FROM xyz
            LIMIT 3000,1000
        ) a
    where t1.xyzID = a.xyzID
    );

The correlated subquery now needs to be executed Everytime a single row is deleted.

So: For 1000 deletions, you will run 1000 subqueries on the temporary table a. Only the inner query will remain uncorrelated.

Compared to in(valuelist) you are running 1001 queries rather than 1.

docu:

An implication is that an IN subquery can be much slower than a query written using an IN(value_list) operator that lists the same values that the subquery would return.

dognose
  • 20,360
  • 9
  • 61
  • 107
  • My thoughts were wandering on the same road, but I did not saw the reasoning, why the subquery should run X time? mysql optimizer seems to work for the completely opposite direction here. Still, thank you for pointing out all those details from docs. – w.k Nov 20 '15 at 10:11
  • @w.k Sometimes the optimizer takes wrong decissions. You should rewrite it as a join if possible. However to give you a solution on that we would Need more Details about the actual query, since the "random" inner select you are doing doesn't outline any conditions for the ids you need to delete. Basically you should be able to just take your INNER query, and replace `SELECT id` with `DELETE FROM` to achieve what you Need - no need for nesting subqueries: `DELETE FROM WHERE ID in (SELECT ID where condition)` equals `DELETE FROM WHERE condition` – dognose Nov 20 '15 at 12:54
0

The first step to solving this problem is to select the ids you want to delete into a temporary table. However, you might still run into the slow subquery problem when you try to actually do the delete.

The solution to that is to use DELETE xyz FROM xyz INNER JOIN xyz_temp WHERE xyz.id = xyz_temp.id syntax, which achieves the same thing and runs as fast as a simple join.

grahamparks
  • 16,130
  • 5
  • 49
  • 43
-1

Subqueries means you are asking your db engine to compare all the "N" rows in the first table with all the "M" rows in another table you are creating in that moment. That's means you have N*M compare operation and to do it, you need to join the tables. The table you are building have N * M rows.

Without subquery you are just comparing all the "N" row in your table with "X" keywords where "X" << "M".

Luca Giardina
  • 478
  • 4
  • 14