I'm curious about how the execution of EXISTS()
is supposed to be faster than IN()
.
I was answering a question when Bill Karwin brought up a good point. when you use EXISTS()
it is using a correlated subquery (dependent subquery) and IN() is only using a subquery.
EXPLAIN shows that EXISTS
and NOT EXISTS
both use a dependent subquery and IN / NOT IN
both use just a subquery.. so I'm curious how a correlated subquery is faster than a subquery??
I've used EXISTS before and it does execute faster than IN which is why I'm confused.
Here is a SQLFIDDLE with the explains
EXPLAIN SELECT COUNT(t1.table1_id)
FROM table1 t1
WHERE EXISTS
( SELECT 1
FROM table2 t2
WHERE t2.table1_id <=> t1.table1_id
);
+-------+-----------------------+-----------+-------+---------------+-----------+--------+--------------------------+--------+------------------------------+
| ID | SELECT_TYPE | TABLE | TYPE | POSSIBLE_KEYS | KEY |KEY_LEN | REF | ROWS | EXTRA |
+-------+-----------------------+-----------+-------+---------------+-----------+--------+--------------------------+--------+------------------------------+
| 1 | PRIMARY | t1 | index | (null) | PRIMARY | 4 | (null) | 4 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | t2 | REF | table1_id | table1_id| 4 | db_9_15987.t1.table1_id | 1 | Using where; Using index |
+-------+-----------------------+-----------+-------+---------------+-----------+--------+--------------------------+--------+------------------------------+
EXPLAIN SELECT COUNT(t1.table1_id)
FROM table1 t1
WHERE NOT EXISTS
( SELECT 1
FROM table2 t2
WHERE t2.table1_id = t1.table1_id
);
+-------+-----------------------+-----------+-------+---------------+-----------+--------+--------------------------+--------+------------------------------+
| ID | SELECT_TYPE | TABLE | TYPE | POSSIBLE_KEYS | KEY |KEY_LEN | REF | ROWS | EXTRA |
+-------+-----------------------+-----------+-------+---------------+-----------+--------+--------------------------+--------+------------------------------+
| 1 | PRIMARY | t1 | index | (null) | PRIMARY | 4 | (null) | 4 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | t2 | ref | table1_id | table1_id| 4 | db_9_15987.t1.table1_id | 1 | Using index |
+-------+-----------------------+-----------+-------+---------------+-----------+--------+--------------------------+--------+------------------------------+
EXPLAIN SELECT COUNT(t1.table1_id)
FROM table1 t1
WHERE t1.table1_id NOT IN
( SELECT t2.table1_id
FROM table2 t2
);
+-------+-------------------+-----------+-------+---------------+-----------+--------+----------+--------+------------------------------+
| ID | SELECT_TYPE | TABLE | TYPE | POSSIBLE_KEYS | KEY |KEY_LEN | REF | ROWS | EXTRA |
+-------+-------------------+-----------+-------+---------------+-----------+--------+----------+--------+------------------------------+
| 1 | PRIMARY | t1 | index | (null) | PRIMARY | 4 | (null) | 4 | Using where; Using index |
| 2 | SUBQUERY | t2 | index | (null) | table1_id| 4 | (null) | 2 | Using index |
+-------+-------------------+-----------+-------+---------------+-----------+--------+----------+--------+------------------------------+
FEW questions
In the explains above, how does EXISTS have using where
and using index
in extras but NOT EXISTS does not have using where
in extras?
How is a correlated subquery faster than a subquery?