SELECT with WHERE IN and subquery extremely slow

Question

I want to exectute the following query:

SELECT *
FROM `bm_tracking`
WHERE `oid` IN
    (SELECT `oid`
    FROM `bm_tracking`
    GROUP BY `oid` HAVING COUNT(*) >1)

The subquery:

SELECT  `oid` 
FROM  `bm_tracking` 
GROUP BY  `oid` 
HAVING COUNT( * ) >1

executes in 0.0525 secs

The whole query "stucks" (still processing after 3 minutes...). Column oid is indexed.

Table bm_tracking contains around 64k rows.

What could be the reason for this "stuck"?

[Edit: Upon request]

CREATE TABLE `bm_tracking` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `oid` varchar(10) NOT NULL,
 `trk_main` varchar(50) NOT NULL,
 `tracking` varchar(50) NOT NULL,
 `label` text NOT NULL,
 `void` int(11) NOT NULL DEFAULT '0',
 `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
 PRIMARY KEY (`id`),
 KEY `oid` (`oid`),
 KEY `trk_main` (`trk_main`),
 KEY `tracking` (`tracking`),
 KEY `created` (`created`)
) ENGINE=MyISAM AUTO_INCREMENT=63331 DEFAULT CHARSET=latin1

[Execution Plan] enter image description here

Perhaps I'm wrong, but I don't see the reason for having the sub-query... — ʰᵈˑ, Oct 01 '14 at 16:51
So the subquery should give me all `oid` that are duplicate (works as it should) and the 'main query` should then outpul all rows with these `oid`s. — Andreas Hinderberger, Oct 01 '14 at 16:55
It's "executed" from php, so i could easily split it down to 2 requests, but all in one would be nicer — Andreas Hinderberger, Oct 01 '14 at 16:56
execution plan... run your query with 'EXPLAIN' right before the first select — John Ruddell, Oct 01 '14 at 17:01
Maybe you have very skewed cardinality on oid,but still that doesnt explain the slowness.I think you ran into a famous mysql bug with subqueries. — Mihai, Oct 01 '14 at 17:08
@Mihai do you think that that would make the execution plan not consider the IN() a dependent subquery? — John Ruddell, Oct 01 '14 at 17:12
@JohnRuddell Mysql internally transforms any IN subquery to EXISTS .While your answer is correct it doesnt get to the heart of the issue,which is why sometimes it fails to do so,like here.Thats why i mentioned a bug in mysql. — Mihai, Oct 01 '14 at 17:16

John Ruddell · Accepted Answer · 2014-10-01T17:45:08.187

2

Generally exists EXISTS faster than IN so you can try this and see if it executes better for you

SELECT *
FROM `bm_tracking` bt
WHERE EXISTS
(   SELECT 1
    FROM `bm_tracking` bt1
    WHERE bt.oid = bt1.oid
    GROUP BY `oid` 
    HAVING COUNT(*) >1
)

EDIT:

if you notice from the EXPLAIN you posted... the IN() is considered as a DEPENDENT SUBQUERY which is a correlated subquery... meaning that for every row in the table all rows in the table are pulled and compared... so for example 1000 rows in the table would mean 1000 * 1000 = 1 million comparisons -- thats why its taking such a long time

edited Oct 01 '14 at 17:45

answered Oct 01 '14 at 16:56

John Ruddell

25,283
6
57
86

1

Too bad Mysql does not support with statement. – Beri Oct 01 '14 at 17:01
Thanks John, that's working like a charm.. but i still wonder why the IN query "stucks".... – Andreas Hinderberger Oct 01 '14 at 17:03
@AndreasHinderberger read [**My Question**](http://stackoverflow.com/questions/25756112/mysql-exists-vs-in-correlated-subquery-vs-subquery) ... exists can optimize the execution plan a lot better than IN :) – John Ruddell Oct 01 '14 at 17:05
@AndreasHinderberger I just edited my question with an explanation why its taking longer – John Ruddell Oct 01 '14 at 17:12
Thanks John, but it's crazy. So in cases like that (if i don't wanna use the EXISTS contruct) it's faster to just use 2 queries? – Andreas Hinderberger Oct 01 '14 at 17:12
@AndreasHinderberger it may be better to do it that way... you would need to play around with the performance of it... you could also try using group_concat with user defined variables and see if it works better that way – John Ruddell Oct 01 '14 at 17:14
Are you saying that your 'exists select' is a NOT a 'correlated subquery'? What is happening than? I really would like to know. – Ryan Vincent Oct 01 '14 at 17:27
@RyanVincent Exists is considered a dependent subquery but it is not executed like one... look 5 comments up at the link I posted.. I asked this exact question because I was confused on why it was running faster. :) – John Ruddell Oct 01 '14 at 17:34
1

@JohnRuddell : i can't edit your answer, there's a "typo". The ` in front of EXISTS. Thanks :D – Andreas Hinderberger Oct 01 '14 at 17:44
However, depending on the 'optimizer' and the 'stored statistics' and the data distribution, your answer being fast could be considered 'lucky'? Or are you saying that 'exists select' is normally better? I am not being 'awkward', i really would like to know, especially in the case of 'mysql' with 'InnoDB'. I suspect that 'exists select' will use indexes and 'IN select' needs more clues. – Ryan Vincent Oct 01 '14 at 18:01
@RyanVincent No, MySQL internally compiles IN to EXISTS but it has a bug with subqueries where it sometimes doesn't do that (which causes IN to be slower).. exists select is better – John Ruddell Oct 01 '14 at 18:30
Thanks for pointing out the bug with which ' MySQL internally compiles IN to EXISTS ': where it sometimes doesn't do that (which causes IN to be slower).. exists select is better'. – Ryan Vincent Oct 01 '14 at 19:15
@RyanVincent sure thing... if you read the answers to that question I made [**HERE**](http://stackoverflow.com/q/25756112/2733506) they have a lot of other reasons why its faster execution plan wise :) – John Ruddell Oct 01 '14 at 19:16
Not dis-agreeing with you. I like 'select exists' when the indexes are appropriately 'selective' and the 'optimizer' uses them. However, not sure to always use 'exists select'. – Ryan Vincent Oct 01 '14 at 19:22
@RyanVincent i know you aren't :) i would recommend if you have to make a choice between in or exists to just use exists... not all queries will need exists though :) – John Ruddell Oct 01 '14 at 19:23

SELECT with WHERE IN and subquery extremely slow

1 Answers1

EDIT: