22

I have a set of consecutive rows I want to get based upon their primary key, which is an auto-incrementing integer. Assuming that there are no holes, is there any performance between between:

SELECT * FROM `theTable` WHERE `id` IN (n, ... nk); 

and:

SELECT * FROM `theTable` WHERE `id` BETWEEN n AND nk;
pr1001
  • 21,727
  • 17
  • 79
  • 125
  • 1
    both are totally different thing. One checks range like analog signal, another one check states like digital signal. So this dont come for performance comparison – Sadat Jul 22 '10 at 11:27
  • 4
    logically the between should perform better, since it will make two comparisons per element instead of the number of ids in the IN case (*but this is just my feeling .. no hard evidence to support this*) – Gabriele Petrioli Jul 22 '10 at 11:27
  • You should consider reassigning the accepted answer (if that is possible) to LukasEnder. Andomar's answer is wrong and LukasEnder explains why. – Code Commander Oct 06 '12 at 17:34

4 Answers4

19

BETWEEN should outperform IN in this case (but do measure and check execution plans, too!), especially as n grows and as statistics are still accurate. Let's assume:

  • m is the size of your table
  • n is the size of your range

Index can be used (n is tiny compared to m)

  • In theory, BETWEEN can be implemented with a single "range scan" (Oracle speak) on the primary key index, and then traverse at most n index leaf nodes. The complexity will be O(n + log m)

  • IN is usually implemented as a series (loop) of n "range scans" on the primary key index. With m being the size of the table, the complexity will always be O(n * log m) ... which is always worse (neglibile for very small tables m or very small ranges n)

Index cannot be used (n is a significant portion of m)

In any case, you'll get a full table scan and evaluate the predicate on each row:

  • BETWEEN needs to evaluate two predicates: One for the lower and one for the upper bound. The complexity is O(m)

  • IN needs to evaluate at most n predicates. The complexity is O(m * n) ... which is again always worse, or perhaps O(m) if the database can optimise the IN list to be a hashmap, rather than a list of predicates.

Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
  • 1
    I would expect a range scan to be better than a unique scan for scanning ranges. Otherwise, why would Oracle implement the range scan at all? – Andomar Oct 07 '12 at 20:08
  • On mysql IN can have serious performance impacts in comparison to between, I've seen stalls of many seconds where IN contained a few thousand numbers. The same query with BETWEEN takes a few milliseconds. So when you have the choice: always use between – John May 26 '20 at 02:40
  • @John: Never say "always". If your `IN` list has 1-2 elements, I somewhat doubt that your claim is correct. – Lukas Eder May 26 '20 at 09:46
  • @LukasEder Between will be as fast as IN when using 2+ elements, so you lose nothing when using it. And when using 1 element you've no reason to use any of both. The 'always' stays valid no matter if 2 elements or 2 million, the more elements you target the more efficient it will be. There is no reason whatsoever to use IN when BETWEEN can achieve the same, both syntax variants have their purpose. When using IN as replacement for BETWEEN you rape it's purpose. – John May 26 '20 at 12:07
16

a between b and c is a macro that expands to b <= a and a <= c.

a in (b,c,d) is a macro that expands to a=b or a=c or a=d.

Assuming your n and nk are integer, both should end up meaning the same. The between variant should be much faster because it's only two compares, versus nk - n compares for the in variant.

Andomar
  • 232,371
  • 49
  • 380
  • 404
4

I have done research for this question. I have 11M rows in my table. I have executed two queries on that:

Query 1:SELECT * FROM PLAYERS WHERE SCORE BETWEEN 10 TO 20

Query 2:SELECT * FROM PLAYERS WHERE SCORE IN (10,11,...,20)

While execution time, both queries are translated as Andomar said above.

Among both queries, Query 1 is running faster than Query 2.

To know more follow this link:

Performance of BETWEEN VS IN() in MySQL

Thank you.

Community
  • 1
  • 1
Arpita
  • 71
  • 1
  • 4
0

In many database servers, IN() is just a synonym for multiple OR clauses, because the two are logically equivalent. Not so in MySQL, which sorts the values in the IN() list and uses a fast binary search to see whether a value is in the list. This is O(Log n) in the size of the list, whereas an equivalent series of OR clauses is O(n) in the size of the list (i.e., much slower for large lists)

Dhrubo Saha
  • 11
  • 1
  • 4
  • I believe this is a direct quote from "High Performance MySQL: Optimization, Backups, and Replication" is it not? If so, it should be noted as such! – Aaron Francis Nov 16 '21 at 03:10