I doubt whether LEFT JOIN
truly perform better than NOT IN
. I just perform a few tests with the following table structure (if I am wrong please correct me):
account (id, ....) [42,884 rows, index by id]
play (account_id, playdate, ...) [61,737 rows, index by account_id]
(1) Query with LEFT JOIN
SELECT * FROM
account LEFT JOIN play ON account.id = play.account_id
WHERE play.account_id IS NULL
(2) Query with NOT IN
SELECT * FROM
account WHERE
account.id NOT IN (SELECT play.account_id FROM play)
Speed test with LIMIT 0,...
LIMIT 0,-> 100 150 200 250
-------------------------------------------------------------------------
LEFT 3.213s 4.477s 5.881s 7.472s
NOT EXIST 2.200s 3.261s 4.320s 5.647s
--------------------------------------------------------------------------
Difference 1.013s 1.216s 1.560s 1.825s
As I increase the the limit, the difference is getting larger and larger
With EXPLAIN
(1) Query with LEFT JOIN
SELECT_TYPE TABLE TYPE ROWS EXTRA
-------------------------------------------------
SIMPLE account ALL 42,884
SIMPLE play ALL 61,737 Using where; not exists
(2) Query with NOT IN
SELECT_TYPE TABLE TYPE ROWS EXTRA
-------------------------------------------------
SIMPLE account ALL 42,884 Using where
DEPENDENT SUBQUERY play INDEX 61,737 Using where; Using index
It seem like the LEFT JOIN does not make use of index
LOGIC
(1) Query with LEFT JOIN
After LEFT JOIN between account and play will produce 42,884 * 61,737
= 2,647,529,508 rows. Then check if play.account_id is NULL on those rows.
(2) Query with NOT IN
Binary search takes log2(N) for item existence. That's mean 42,884 * log2(61,737) = 686,144 steps