5

Given this SQL query in MySQL:

SELECT * FROM tableA WHERE tableA.id IN (SELECT id FROM tableB);

Does MySQL execute the subquery SELECT id FROM tableB multiple times for each row in tableA?

Is there a way to make sql go faster without using variables or store procedures?

Why is this often slower than using LEFT JOIN?

Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
Reactgular
  • 52,335
  • 19
  • 158
  • 208

2 Answers2

11

Your assumption is false; the subquery will be executed only once. The reason why it's slower than a join is because IN can't take advantage of indexes; it has to scan its arguments once for each time the WHERE clause is evaluated, that is, once per row in tableA. You can optimize the query, without using variables or stored procedures, simply by replacing the IN with a join, thus:

SELECT tableA.field1, tableA.field2, [...]
FROM tableA 
  INNER JOIN tableB ON tableA.id = tableB.id

Unless you don't mind getting back every field from both tables, you do need to enumerate the fields you want in the SELECT clause; tableA.*, for example, will elicit a syntax error.

Aaron Miller
  • 3,692
  • 1
  • 19
  • 26
  • 4
    +1. Also a `LEFT JOIN` isn't the same thing. `INNER JOIN` is the answer. – Cfreak Sep 13 '13 at 16:17
  • @Cfreak Well, if I wanted to be picky, I could argue that `LEFT JOIN tableB on tableA.id = tableB.id AND tableB.id IS NOT NULL` would have the same effect if more verbosely, but that'd be silly. – Aaron Miller Sep 13 '13 at 16:21
  • 2
    Are you sure that in MySQL the subquery will be executed only once? Where does the assurance come from? – newtover Sep 13 '13 at 16:22
  • no indexes! That explains a lot, and enumerating the values for `IN` for each row is also going to slow it down. Thanks for the clarification. – Reactgular Sep 13 '13 at 16:23
  • @newtover From experience; feel free to confirm it yourself with any MySQL 5.0 instance that's configured to use a general query log. – Aaron Miller Sep 13 '13 at 16:24
  • @MathewFoscarini Glad to be of help! – Aaron Miller Sep 13 '13 at 16:25
  • How does the general log tell you whether the subquery is executed once or for each row? One should play with counters to be sure. But similar kind of subquries is usually not optimized by MySQL (it is from my experience). But this trivial one certaily might be optimized. – newtover Sep 13 '13 at 16:29
  • Moreover, if you'd select something other than the primary key (and not stored as a first field in any index), you would see a significant slowdown. I've just tried. – newtover Sep 13 '13 at 16:35
  • With JOIN, if tableB.id is not unique it will return the same row from tableA multiple times. With IN, matched rows are returned only once even if tableB.id has repeated values. – user34814 Jun 17 '18 at 19:03
5

First, this depends on the version of MySQL. I believe that version 5.6 optimizes such queries correctly. MySQL documentation is inconsistent on this. For instance, here it says one thing:

Consider the following subquery comparison:

outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)

MySQL evaluates queries “from outside to inside.” That is, it first obtains the value of the outer expression outer_expr, and then runs the subquery and captures the rows that it produces.

This "from outside to inside" means that the subquery is evaluated for each row. This is consistent with my experience with MySQL.

The documentation suggests otherwise here:

Some optimizations that MySQL itself makes are:

  • MySQL executes uncorrelated subqueries only once. Use EXPLAIN to make sure that a given subquery really is uncorrelated.
  • MySQL rewrites IN, ALL, ANY, and SOME subqueries in an attempt to take advantage of the possibility that the select-list columns in the subquery are indexed.

I believe the statement does not refer to in clauses. Perhaps what happens is that the subquery is rewritten as a correlated subquery to check for indexes, and then it is run multiple times (regardless of the presence of a index).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Sure, anyone is not sure about this, please have a look at this question: "MySQL “IN” queries terribly slow with subquery but fast with explicit values" - https://stackoverflow.com/questions/5018284/mysql-in-queries-terribly-slow-with-subquery-but-fast-with-explicit-values – The Anh Nguyen Apr 08 '21 at 07:02
  • @TheAnhNguyen . . . That question is *10 years old*. The referenced version of MySQL in this answer was 8 years ago. – Gordon Linoff Apr 08 '21 at 12:42
  • Yesterday, I have same problem with MariaDB 5.5.60 (released in 2018). So I also want to know that what RBDMS optimizes such queries correctly and vice versa. – The Anh Nguyen Apr 09 '21 at 00:51
  • Oh, by MySQL 8.0 Reference link (https://dev.mysql.com/doc/refman/8.0/en/subquery-materialization.html). I will make a test with it later. It said that: The optimizer uses materialization to enable more efficient subquery processing. Materialization speeds up query execution by generating a subquery result as a temporary table, normally in memory. The first time MySQL needs the subquery result, it materializes that result into a temporary table. Any subsequent time the result is needed, MySQL refers again to the temporary table. – The Anh Nguyen Apr 09 '21 at 00:57