MySQL Query Performance - Subquery Vs Join

Question

I'm not a SQL expert, so am trying to understand why two queries have substantially different execution times.

common_stats is a large table (3 million rows). I'm just trying to understand why the huge difference in performance.

The below query takes ~15 seconds:

select distinct cs.basesalary_id
from common_stats AS cs
LEFT JOIN basesalary AS b ON b.id = cs.basesalary_id
WHERE (b.employee_id= 100 AND cs.amount > 0 AND cs.pay_id is null );

This query takes ~.1 seconds:

select distinct basesalary_id from (
    select cs.basesalary_id, cs.pay_id
    from common_stats AS cs
    LEFT JOIN basesalary AS b ON b.id = cs.basesalary_id
    WHERE (b.employee_id= 100 AND cs.amount > 0)
) as temp
where pay_id is null;

Have you tried putting `explain` in front of the query and comparing the execution plans? — Phylogenesis, Oct 07 '15 at 14:53
@e4c5 But this example is a subquery significantly outperforming a join. — Phylogenesis, Oct 07 '15 at 15:05
Bit of a duplicate question: http://stackoverflow.com/questions/2577174/join-vs-sub-query — Dwight Spencer, Oct 07 '15 at 15:05
Always use `EXPLAIN` (and do post its output here). Always ensure you're not pulling from query cache by issuing `RESET QUERY CACHE;` if you have two queries that operate on the same data. Another very, very important thing to bear in mind - you can optimize SQL to infinity and beyond but hardware plays a hugely important role. It's important to know which engine you're using and whether it uses your hardware optimally. — Mjh, Oct 07 '15 at 15:21

score 2 · Answer 1 · answered Oct 10 '15 at 18:23

As a general rule:

If the subquery 'shrinks' the number of rows (say, via GROUP BY or LIMIT), the subquery approach is better.
JOIN often creates more rows than the original tables. DISTINCT is sometime used to deflate that row count.
If the outer query has a GROUP BY, JOIN may create more rows than you realize and "inflate" any aggregates (SUM, COUNT, etc), thereby voting for subquery instead.
More than one subquery would lead to poor optimization. (Thereby voting for JOIN.)

All of those assume optimal indexes.

Your first query would probably benefit from

INDEX(pay_id, amount, basesalary_id) -- both "covering" and optimal for `WHERE`

Check out EXPLAIN SELECT ... for both queries. Probably the faster one starts with basesalary and there is INDEX(employee_id) and that is very selective.

I may have more comments after seeing SHOW CREATE TABLE.

When using a subquery, the explain shows the following sequence: 1 - PRIMARY - NO KEYS - 667154 rows - Extra: "Using where" 2 - DERIVED - NO KEYS - 667154 rows - Extra: none 3 - DERIVED - PRIMARY KEY - 1 row - Extra: none When using a join, explain shows this sequence: 1 - SIMPLE - NO KEYS - 667171 rows - Extra: none 2 - SIMPLE - PRIMARY KEY - 1 row - Extra: using where — cdm, Nov 12 '15 at 18:04
I'm sorry; that is too hard to read; please edit your Question to add it. — Rick James, Apr 21 '21 at 23:11

MySQL Query Performance - Subquery Vs Join

1 Answers1