-2

I have a table where I have a status field which can have values like 1,2,3,4,5. I need to select all the rows from the table with status != 1. I have the following 2 options:

NOTE that the table has INDEX over status field.

SELECT ... FROM my_tbl WHERE status <> 1;

or

SELECT ... FROM my_tbl WHERE status IN(2,3,4,5);

Which of the above is a better choice? (my_tbl is expected to grow very big).

Ethan
  • 4,915
  • 1
  • 28
  • 36

3 Answers3

2

You can run your own tests to find out, because it will vary depending on the underlying tables.

More than that, please don't worry about "fastest" without having first done some sort of measurement that it matters.

Rather than worrying about fastest, think about which way is clearest.

In databases especially, think about which way is going to protect you from data errors.

It doesn't matter how fast your program is if it's buggy or gives incorrect answers.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
  • Yes, you are right. I also thought about it and '<>1' is safe and clear. However, given the big size of the table and given that status will not have any other values, I thought about doing performance evaluation. – Ethan Jan 19 '14 at 02:38
  • Yes, doing a performance evaluation is a fantastic idea. It will give you a much more definitive answer than asking a bunch of random people about an abstract example in undefined tables on an unnamed database. – Andy Lester Jan 19 '14 at 02:39
  • I understand now where the query was cryptic. I added the note that index is only on status field. Hope it's clear now. – Ethan Jan 19 '14 at 02:43
1

How many rows have the value "1"? If less than ~20%, you will get a table scan regardless of how you formulate the WHERE (IN, <>, BETWEEN). That's assuming you have INDEX(status).

But indexing ENUMs, flags, and other things with poor cardinality is rarely useful.

An IN clause with 50K items causes memory problems (or at least used to), but not performance problems. They are sorted, and a binary search is used.

Rule of Thumb: The cost of evaluation of expressions (IN, <>, functions, etc) is mostly irrelevant in performance. The main cost is fetching the rows, especially if they need to be fetched from disk.

An INDEX may assist in minimizing the number of rows fetched.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thanks Rick. Good points. The reason I wanted to know about performance is the mention of IN vs OR performance here: http://stackoverflow.com/questions/782915 – Ethan Jan 21 '14 at 18:17
0

you can use BENCHMARK() to test it yourself.

http://sqlfiddle.com/#!2/d41d8/29606/2

the first one if faster which makes sense since it only has to compare 1 number instead of 4 numbers.

Tin Tran
  • 6,194
  • 3
  • 19
  • 34
  • This is fantastic link! Thanks for showing me that. Yes, <> 1 makes sense logically also. – Ethan Jan 19 '14 at 02:40
  • The only test that means a damn thing is the one that **you** run on **your** machine on **your** database using **your** tables filled with **your** data. – Andy Lester Jan 19 '14 at 02:42
  • you can also use @var BETWEEN 2 AND 5 which also makes sense depending on your data maybe future statuses might have other values in which case if you don't want to include them you can use BETWEEN if you know you want all statuses greater than 1 you can use > 1 as well – Tin Tran Jan 19 '14 at 02:42
  • Optimizers are very smart things, and they will take into account the distribution of values in the indexes. – Andy Lester Jan 19 '14 at 02:44
  • @TinTran Refer this link http://stackoverflow.com/questions/782915 for some performance eval. – Ethan Jan 19 '14 at 02:45