5

MySQL Server version: 5.1.41 On Ubuntu 10.04

I came across a difference in MySQL's behavior when modifying some queries and wanted to know the reason for it.

Basically I am creating a view. When I query the view, the result set is the same However, the number of rows read are different for an IN clause than for an OR clause. Below below is a simple example :

CREATE TABLE country ( 
    id_country int(11) NOT NULL AUTO_INCREMENT, 
    name varchar(50) NOT NULL, 
    PRIMARY KEY (id_country) 
) ENGINE=InnoDB; 

INSERT INTO country (name) VALUES ('A'), ('B'), ('C'), ('D'), ('E'), ('F'), ('G'), ('H'); 

CREATE TABLE status ( 
    id_status int(11) NOT NULL AUTO_INCREMENT, 
    id_country int(11) NOT NULL, 
    status tinyint(4) NOT NULL, 
    PRIMARY KEY (id_status) 
) ENGINE=InnoDB; 
ALTER TABLE status ADD INDEX ( id_country ); 
ALTER TABLE status ADD FOREIGN KEY ( id_country ) REFERENCES test.country (id_country) ON DELETE RESTRICT ON UPDATE RESTRICT ; 

INSERT INTO status(id_country, status) VALUES 
(1,0), (2,1), (3,0), (4,1), (5,0),(6,1), (7,0), (8,1); 

CREATE ALGORITHM=MERGE VIEW view_country 
AS 
    SELECT c.*, s.id_status, s.status 
    FROM country c JOIN status s ON c.id_country = s.id_country; 

The 2 explain statements below show different number of rows parsed

mysql> EXPLAIN EXTENDED  SELECT * FROM view_country WHERE id_country IN (1, 2, 3)\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 3
     filtered: 100.00
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: s
         type: ref
possible_keys: id_country
          key: id_country
      key_len: 4
          ref: test.c.id_country
         rows: 1
     filtered: 100.00
        Extra: 
2 rows in set, 1 warning (0.00 sec)

Using the OR Clause

mysql> EXPLAIN EXTENDED SELECT * FROM view_country WHERE id_country = 1 OR id_country = 2 OR id_country = 3\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: s
         type: ALL
possible_keys: id_country
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 8
     filtered: 37.50
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.s.id_country
         rows: 1
     filtered: 100.00
        Extra: 
2 rows in set, 1 warning (0.00 sec)

If you look at the "rows" in both the queries - they add up differently

The query with OR clause reads less rows compared to IN, which adds up for huge tables and joins.

Can someone help me understand why this is so ?

Thank you for your time.

itz_nsn
  • 648
  • 1
  • 8
  • 13
  • Consistently the same but different results? – Marcus Adams Feb 22 '12 at 18:40
  • @Marcus - i am sry i did not understand the question - if you meant i get consistent result set everytime and the number of rows parsed every time is also consistent - then the answer is yes – itz_nsn Feb 23 '12 at 17:39
  • @Marcus Adams - there was a copy paste blunder on my part - i have corrected it. The issue is not about different result sets - the result sets are the same -but the number of rows read are different between IN vs OR - let me know if you are not able to reproduce it - i am on server version: 5.1 – itz_nsn Feb 23 '12 at 22:04
  • @MarcusAdams - i think i am getting confused because of this comment you wrote which clearly indicates that it's not from a simple SELECT, and not from the queries that you are showing - did u see that i was querying from a view? did you see my view definition? to make sure i was sane i created an empty database - ran all the queries i have mentioned, then ran the explain statements - no extra queries or select statements - did you try to create the tables using the sql i have given? - if you did and you still feel that there is something missing - let me know – itz_nsn Feb 24 '12 at 03:27
  • I'm sorry, I did NOT realize that you were querying a view. I've provided my answer below. – Marcus Adams Feb 24 '12 at 14:15

2 Answers2

1

Please note that execution plans have a lot to do with the state of your indexes and the size of your tables. MySQL may execute differently even for similar queries, and sometimes MySQL can even guess wrong.

The view with the JOIN definitely complicates things, so your SELECT statement is not so simple. Don't be surprised that MySQL chooses a different execution plan for IN versus OR.

In the case of the first query, MySQL has chosen to use indexes for both queries, which results in the specific and accurate row count in the EXPLAIN.

However, in the second query, MySQL has chosen to scan all the rows in the status table. This makes sense since there are so few rows and MySQL has to visit the table anyway because there is no covering index that will return all needed rows. I wouldn't be surprised if the second query isn't actually faster than the first. Also, please note that the row count (for scans) in EXPLAIN are estimates, so take that into consideration when profiling your queries.

The first query has to do 6 lookups, whereas the second query only has to do 3 lookups after a very short table scan.

There are many tricks that MySQL does that are sometimes limited to very specific scenarios to try to optimize your query, based on the current indexes and row counts. There are documented cases where, for similar queries, MySQL will take two different approaches and end up with the same execution path. There are other cases where two completely different execution plans result in similar performance, and this is one of those cases.

Anyway, I hope this explains to you why there is a difference, but as long as the results are the same, and the performance is similar, there's nothing to be concerned about.

In some cases, as I said earlier, MySQL will just not make the best guess, and then you can use tools like index hints and natural joins. In your case, I think MySQL is behaving just fine.

To research performance and execution plans more check out the two following sites:

Marcus Adams
  • 53,009
  • 9
  • 91
  • 143
  • thanks for the explanation Marcus - as a good practice, i think i need to go through any complex join queries and see how mysql selects the indices – itz_nsn Mar 01 '12 at 04:04
0

If I understood properly, the results you get are the same, and that you'd like to know the difference between the 'OR' and 'IN' clause regarding speed and how they work.

If so, than I think your question is a possible duplicate of this one: IN vs OR in the SQL WHERE Clause

Community
  • 1
  • 1
Honnes
  • 325
  • 1
  • 12
  • I believe the OP indicated that results are different using "IN" as opposed to "OR" – Zack Macomber Feb 22 '12 at 18:49
  • Ah ok, then I didn't understand his question properly. I would have thought that the results would be the same. – Honnes Feb 23 '12 at 07:27
  • @ZackMacomber , from the question: *"the result set is the same"*. – ypercubeᵀᴹ Feb 23 '12 at 13:41
  • 1
    i provided some more info on the question including the server version and what the output of the EXPLAIN EXTENDED is. The server version might matter because one person said he could not replicate it on mysql 5.6 - the result set is same but the number of rows read to produce the result is different - that is what i am interested in - – itz_nsn Feb 23 '12 at 13:43
  • @ypercube - am I not seeing something correct here? I just did a find on this page for "the result set is the same" in Google Chrome and I only come back with 1 result which is found in your comment. – Zack Macomber Feb 23 '12 at 13:57
  • The question was edited 1 minute after my comment. See [Revisions](http://stackoverflow.com/posts/9400797/revisions) (not sure if this page will be visible by you or not, due to reputation restrictions. – ypercubeᵀᴹ Feb 23 '12 at 14:00
  • @ypercube - ok - just looked at the previous edits on this post and that phrase was found in one of the revisions. When I looked at the post and responded to it yesterday I didn't see that particular phrase and also the wording seemed to indicate there were different results. naveen has clarified in the comments section that "the result set is same" so we have clear direction here now. – Zack Macomber Feb 23 '12 at 14:02