1

I found some strange(for me) behavour in MySQL. I have a simple query:

SELECT CONVERT( `text`.`old_text` 
USING utf8 ) AS stext
FROM  `text` 
WHERE  `text`.`old_id` IN 
(    
 SELECT  `revision`.`rev_text_id`
 FROM  `revision` 
 WHERE  `revision`.`rev_id`
 IN 
  (    
  SELECT `page_latest`
  FROM  `page` 
  WHERE `page_id` = 108
  )
)

when i run it, phpmyadmin show execution time of 77.0446 seconds.
But then i replace

WHERE  `text`.`old_id` IN 

by

WHERE  `text`.`old_id` = 

it's execution time falls to about 0.001 sec. Result of this query

 SELECT  `revision`.`rev_text_id`
 FROM  `revision` 
 WHERE  `revision`.`rev_id`
 IN 
  (    
  SELECT `page_latest`
  FROM  `page` 
  WHERE `page_id` = 108
  )    

is

+------------+
|rev_text_id |
+------------+
|6506        |
+------------+

Can somebody please explain this behavour?

alex_java_kotlin
  • 429
  • 6
  • 19

3 Answers3

2

try to add INDEX on the following columns,

ALTER TABLE `text` ADD INDEX idx_text (old_id);
ALTER TABLE `revision` ADD INDEX idx_revision (rev_text_id);

and Execute the following query

SELECT  DISTINCT CONVERT(a.`old_text` USING utf8 ) AS stext
FROM    `text` a
        INNER JOIN `revision` b
            ON a.`old_id` = b.`rev_text_id`
        INNER JOIN `page` c
            ON b.`rev_id` = c.`page_latest`
WHERE   c.`page_id` = 108

PS: Can you run also the following query and post their respective results?

DESC `text`;
DESC `revision`;
DESC `page`;
John Woo
  • 258,903
  • 69
  • 498
  • 492
1

MySQLDB is looping through each result of the inner query and comparing it with each record in the outer query. in the second inner query;

   WHERE  `revision`.`rev_id`
   IN 
   ( SELECT `page_latest`
     FROM  `page` 
     WHERE `page_id` = 108

you should definitely use '=' instead of IN, since you're selecting a distinct record, there would be no point in looping through a result when you know only one record will be returned each time

pythonian29033
  • 5,148
  • 5
  • 31
  • 56
1

There are two primary ways you can increase your query performance here

  • Add Indexes (such as Kuya mentioned)
  • Rid yourself of the subqueries where possible

For Indexes, add an index on the columns you are searching for your matches: text.old_id, revision.rev_text_id & page.page_id

ALTER TABLE `text` ADD INDEX idx_text (old_id);
ALTER TABLE `revision` ADD INDEX idx_revision (rev_text_id);
ALTER TABLE `page` ADD INDEX idx_page (page_id);

Your next issue is that nested-sub-selects are hell on your query execution plan. Here is a good thread discussing JOIN vs Subquery. Here is an article on how to get execution plan info from mySQL.

First looks at an execution plan can be confusing, but it will be your best friend when you have to concern yourself with query optimization.

Here is an example of your same query with just joins ( you could use inner or left and get pretty much the same result). I don't have your tables or data, so forgive synax issues (there is no way I can verify the code works verbatim in your environment, but it should give you a good starting point).

SELECT 
  CONVERT( `text`.`old_text` USING utf8 ) AS stext
  FROM  `text` 
  -- inner join only returns rows when it can find a 
  --    matching `revision`.`rev_text_id` row to `text`.`old_id`
  INNER JOIN `revision`  
  ON `text`.`old_id` = `revision`.`rev_text_id` 
  -- Inner Join only returns rows  when it can find a 
  --  matching `page_latest` row to `page_latest`
  INNER JOIN `page`
  ON `revision`.`rev_id` = `page`.`page_latest`
WHERE `page`.`page_id` = 108
Community
  • 1
  • 1
Ray K
  • 1,452
  • 10
  • 17