197

When dealing with big databases, which performs better: IN or OR in the SQL WHERE clause?

Is there any difference about the way they are executed?

informatik01
  • 16,038
  • 10
  • 74
  • 104
felix
  • 11,304
  • 13
  • 69
  • 95
  • My first guess would be that OR performs better, unless the SQL engine converts IN into OR behind the scene. Have you seen the query plan of these two? – Raj Jun 19 '10 at 07:19
  • Possible duplicate of [MYSQL OR vs IN performance](http://stackoverflow.com/questions/782915/mysql-or-vs-in-performance) – Steve Chambers Nov 08 '16 at 08:30

8 Answers8

235

I assume you want to know the performance difference between the following:

WHERE foo IN ('a', 'b', 'c')
WHERE foo = 'a' OR foo = 'b' OR foo = 'c'

According to the manual for MySQL if the values are constant IN sorts the list and then uses a binary search. I would imagine that OR evaluates them one by one in no particular order. So IN is faster in some circumstances.

The best way to know is to profile both on your database with your specific data to see which is faster.

I tried both on a MySQL with 1000000 rows. When the column is indexed there is no discernable difference in performance - both are nearly instant. When the column is not indexed I got these results:

SELECT COUNT(*) FROM t_inner WHERE val IN (1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000);
1 row fetched in 0.0032 (1.2679 seconds)

SELECT COUNT(*) FROM t_inner WHERE val = 1000 OR val = 2000 OR val = 3000 OR val = 4000 OR val = 5000 OR val = 6000 OR val = 7000 OR val = 8000 OR val = 9000;
1 row fetched in 0.0026 (1.7385 seconds)

So in this case the method using OR is about 30% slower. Adding more terms makes the difference larger. Results may vary on other databases and on other data.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • 24
    If the optimizer is worth its salt they should perform the same. – Janick Bernet Jun 19 '10 at 07:30
  • 35
    @inflagranti: No optimizer is perfect unfortunately. Optimizers are extremely complex programs and each implementation will have its own strengths and weaknesses. This is why I say you should profile on a specific implementation. I'd imagine that the extra structure of the `IN` method makes it easier to optimize than a whole bunch of possibly related `OR` clauses. I'd be surprised if there is an engine where the `OR` method is faster, but I'm not surprised that there are times when OR is slower. – Mark Byers Jun 19 '10 at 08:12
  • 5
    @MarkByers Couldn't the optimizer always substitute multiple `OR`s with an `IN`? – tymtam Sep 19 '16 at 22:23
41

The best way to find out is looking at the Execution Plan.


I tried it with Oracle, and it was exactly the same.

CREATE TABLE performance_test AS ( SELECT * FROM dba_objects );

SELECT * FROM performance_test
WHERE object_name IN ('DBMS_STANDARD', 'DBMS_REGISTRY', 'DBMS_LOB' );

Even though the query uses IN, the Execution Plan says that it uses OR:

--------------------------------------------------------------------------------------    
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |    
--------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT  |                  |     8 |  1416 |   163   (2)| 00:00:02 |    
|*  1 |  TABLE ACCESS FULL| PERFORMANCE_TEST |     8 |  1416 |   163   (2)| 00:00:02 |    
--------------------------------------------------------------------------------------    

Predicate Information (identified by operation id):                                       
---------------------------------------------------                                       

   1 - filter("OBJECT_NAME"='DBMS_LOB' OR "OBJECT_NAME"='DBMS_REGISTRY' OR                
              "OBJECT_NAME"='DBMS_STANDARD')                                              
Peter Lang
  • 54,264
  • 27
  • 148
  • 161
  • 1
    What happens in Oracle if you have more than 3 values that you are testing? Do you know if Oracle is unable to perform the same binary search optimization as MySQL or does it perform it in both cases? – Mark Byers Jun 19 '10 at 08:59
  • 2
    @Mark Byers: I tried the same query with 10 values, still the same result. Note, that the optimizer resorted my values in alphabetical order. I would not be surprised if Oracle did some internal optimization of that filter... – Peter Lang Jun 19 '10 at 09:49
  • 5
    Oracle also has an `INLIST ITERATOR` operation, which it would select if there were an index it could use. Still, when I tried it out, both `IN` and `OR` end up with the same execution plan. – Cheran Shunmugavel Jun 19 '10 at 15:47
12

The OR operator needs a much more complex evaluation process than the IN construct because it allows many conditions, not only equals like IN.

Here is a list of what you can use with OR but that are not compatible with IN: greater, greater or equal, less, less or equal, LIKE and some more like the oracle REGEXP_LIKE. In addition, consider that the conditions may not always compare the same value.

For the query optimizer it's easier to to manage the IN operator because is only a construct that defines the OR operator on multiple conditions with = operator on the same value. If you use the OR operator the optimizer may not consider that you're always using the = operator on the same value and, if it doesn't perform a deeper and more complex elaboration, it could probably exclude that there may be only = operators for the same values on all the involved conditions, with a consequent preclusion of optimized search methods like the already mentioned binary search.

[EDIT] Probably an optimizer may not implement optimized IN evaluation process, but this doesn't exclude that one time it could happen(with a database version upgrade). So if you use the OR operator that optimized elaboration will not be used in your case.

Alessandro Rossi
  • 2,432
  • 17
  • 24
6

I think oracle is smart enough to convert the less efficient one (whichever that is) into the other. So I think the answer should rather depend on the readability of each (where I think that IN clearly wins)

soulmerge
  • 73,842
  • 19
  • 118
  • 155
3

I'll add info for PostgreSQL version 11.8 (released 2020-05-14).

IN may be significantly faster. E.g. table with ~23M rows.

Query with OR:

explain analyse select sum(mnozstvi_rozdil)
from product_erecept
where okres_nazev = 'Brno-město' or okres_nazev = 'Pardubice';

-- execution plan
Finalize Aggregate  (cost=725977.36..725977.37 rows=1 width=32) (actual time=4536.796..4540.748 rows=1 loops=1)
  ->  Gather  (cost=725977.14..725977.35 rows=2 width=32) (actual time=4535.010..4540.732 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Partial Aggregate  (cost=724977.14..724977.15 rows=1 width=32) (actual time=4519.338..4519.339 rows=1 loops=3)
              ->  Parallel Bitmap Heap Scan on product_erecept  (cost=15589.71..724264.41 rows=285089 width=4) (actual time=135.832..4410.525 rows=230706 loops=3)
                    Recheck Cond: (((okres_nazev)::text = 'Brno-město'::text) OR ((okres_nazev)::text = 'Pardubice'::text))
                    Rows Removed by Index Recheck: 3857398
                    Heap Blocks: exact=11840 lossy=142202
                    ->  BitmapOr  (cost=15589.71..15589.71 rows=689131 width=0) (actual time=140.985..140.986 rows=0 loops=1)
                          ->  Bitmap Index Scan on product_erecept_x_okres_nazev  (cost=0.00..8797.61 rows=397606 width=0) (actual time=99.371..99.371 rows=397949 loops=1)
                                Index Cond: ((okres_nazev)::text = 'Brno-město'::text)
                          ->  Bitmap Index Scan on product_erecept_x_okres_nazev  (cost=0.00..6450.00 rows=291525 width=0) (actual time=41.612..41.612 rows=294170 loops=1)
                                Index Cond: ((okres_nazev)::text = 'Pardubice'::text)
Planning Time: 0.162 ms
Execution Time: 4540.829 ms

Query with IN:

explain analyse select sum(mnozstvi_rozdil)
from product_erecept
where okres_nazev in ('Brno-město', 'Pardubice');

-- execution plan
Aggregate  (cost=593199.90..593199.91 rows=1 width=32) (actual time=855.706..855.707 rows=1 loops=1)
  ->  Index Scan using product_erecept_x_okres_nazev on product_erecept  (cost=0.56..591477.07 rows=689131 width=4) (actual time=1.326..645.597 rows=692119 loops=1)
        Index Cond: ((okres_nazev)::text = ANY ('{Brno-město,Pardubice}'::text[]))
Planning Time: 0.136 ms
Execution Time: 855.743 ms
Grez
  • 306
  • 3
  • 8
2

OR makes sense (from readability point of view), when there are less values to be compared. IN is useful esp. when you have a dynamic source, with which you want values to be compared.

Another alternative is to use a JOIN with a temporary table.
I don't think performance should be a problem, provided you have necessary indexes.

shahkalpesh
  • 33,172
  • 3
  • 63
  • 88
0

Even though you use the IN operator MS SQL server will automatically convert it to OR operator. If you analyzed the execution plans will able to see this. So better to use it OR if its long IN operator list. it will at least save some nanoseconds of the operation.

Chamath Jeevan
  • 5,072
  • 1
  • 24
  • 27
-4

I did a SQL query in a large number of OR (350). Postgres do it 437.80ms.

Use OR

Now use IN:

Use IN

23.18ms

Fruchtzwerg
  • 10,999
  • 12
  • 40
  • 49
Andrey
  • 9
  • 1