1

I want to exectute the following query:

SELECT *
FROM `bm_tracking`
WHERE `oid` IN
    (SELECT `oid`
    FROM `bm_tracking`
    GROUP BY `oid` HAVING COUNT(*) >1)

The subquery:

SELECT  `oid` 
FROM  `bm_tracking` 
GROUP BY  `oid` 
HAVING COUNT( * ) >1

executes in 0.0525 secs

The whole query "stucks" (still processing after 3 minutes...). Column oid is indexed.

Table bm_tracking contains around 64k rows.

What could be the reason for this "stuck"?

[Edit: Upon request]

CREATE TABLE `bm_tracking` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `oid` varchar(10) NOT NULL,
 `trk_main` varchar(50) NOT NULL,
 `tracking` varchar(50) NOT NULL,
 `label` text NOT NULL,
 `void` int(11) NOT NULL DEFAULT '0',
 `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
 PRIMARY KEY (`id`),
 KEY `oid` (`oid`),
 KEY `trk_main` (`trk_main`),
 KEY `tracking` (`tracking`),
 KEY `created` (`created`)
) ENGINE=MyISAM AUTO_INCREMENT=63331 DEFAULT CHARSET=latin1

[Execution Plan] enter image description here

Andreas Hinderberger
  • 1,505
  • 23
  • 41

1 Answers1

2

Generally exists EXISTS faster than IN so you can try this and see if it executes better for you

SELECT *
FROM `bm_tracking` bt
WHERE EXISTS
(   SELECT 1
    FROM `bm_tracking` bt1
    WHERE bt.oid = bt1.oid
    GROUP BY `oid` 
    HAVING COUNT(*) >1
)

EDIT:

if you notice from the EXPLAIN you posted... the IN() is considered as a DEPENDENT SUBQUERY which is a correlated subquery... meaning that for every row in the table all rows in the table are pulled and compared... so for example 1000 rows in the table would mean 1000 * 1000 = 1 million comparisons -- thats why its taking such a long time

John Ruddell
  • 25,283
  • 6
  • 57
  • 86
  • 1
    Too bad Mysql does not support with statement. – Beri Oct 01 '14 at 17:01
  • Thanks John, that's working like a charm.. but i still wonder why the IN query "stucks".... – Andreas Hinderberger Oct 01 '14 at 17:03
  • @AndreasHinderberger read [**My Question**](http://stackoverflow.com/questions/25756112/mysql-exists-vs-in-correlated-subquery-vs-subquery) ... exists can optimize the execution plan a lot better than IN :) – John Ruddell Oct 01 '14 at 17:05
  • @AndreasHinderberger I just edited my question with an explanation why its taking longer – John Ruddell Oct 01 '14 at 17:12
  • Thanks John, but it's crazy. So in cases like that (if i don't wanna use the EXISTS contruct) it's faster to just use 2 queries? – Andreas Hinderberger Oct 01 '14 at 17:12
  • @AndreasHinderberger it may be better to do it that way... you would need to play around with the performance of it... you could also try using group_concat with user defined variables and see if it works better that way – John Ruddell Oct 01 '14 at 17:14
  • Are you saying that your 'exists select' is a NOT a 'correlated subquery'? What is happening than? I really would like to know. – Ryan Vincent Oct 01 '14 at 17:27
  • @RyanVincent Exists is considered a dependent subquery but it is not executed like one... look 5 comments up at the link I posted.. I asked this exact question because I was confused on why it was running faster. :) – John Ruddell Oct 01 '14 at 17:34
  • 1
    @JohnRuddell : i can't edit your answer, there's a "typo". The ` in front of EXISTS. Thanks :D – Andreas Hinderberger Oct 01 '14 at 17:44
  • However, depending on the 'optimizer' and the 'stored statistics' and the data distribution, your answer being fast could be considered 'lucky'? Or are you saying that 'exists select' is normally better? I am not being 'awkward', i really would like to know, especially in the case of 'mysql' with 'InnoDB'. I suspect that 'exists select' will use indexes and 'IN select' needs more clues. – Ryan Vincent Oct 01 '14 at 18:01
  • @RyanVincent No, MySQL internally compiles IN to EXISTS but it has a bug with subqueries where it sometimes doesn't do that (which causes IN to be slower).. exists select is better – John Ruddell Oct 01 '14 at 18:30
  • Thanks for pointing out the bug with which ' MySQL internally compiles IN to EXISTS ': where it sometimes doesn't do that (which causes IN to be slower).. exists select is better'. – Ryan Vincent Oct 01 '14 at 19:15
  • @RyanVincent sure thing... if you read the answers to that question I made [**HERE**](http://stackoverflow.com/q/25756112/2733506) they have a lot of other reasons why its faster execution plan wise :) – John Ruddell Oct 01 '14 at 19:16
  • Not dis-agreeing with you. I like 'select exists' when the indexes are appropriately 'selective' and the 'optimizer' uses them. However, not sure to always use 'exists select'. – Ryan Vincent Oct 01 '14 at 19:22
  • @RyanVincent i know you aren't :) i would recommend if you have to make a choice between in or exists to just use exists... not all queries will need exists though :) – John Ruddell Oct 01 '14 at 19:23