22

I'm developing an application which processes many data in Oracle database.
In some case, I have to get many object based on a given list of conditions, and I use SELECT ...FROM.. WHERE... IN..., but the IN expression just accepts a list whose size is maximum 1,000 items.

So I use OR expression instead, but as I observe -- perhaps this query (using OR) is slower than IN (with the same list of condition). Is it right? And if so, how to improve the speed of query?

OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
Đinh Hồng Châu
  • 5,300
  • 15
  • 53
  • 90
  • Is the list static or derived from a query? – Phil Jun 29 '11 at 01:47
  • No, the list of value to query was retrieved from external resource. Is there any way to solve this issue, because my list is too large, may contain over 100000 items – Đinh Hồng Châu Jun 29 '11 at 01:52
  • 1
    So, you're creating a massive query string containing something like IN (...9997, 9998, 9999, 1000,1001 ...) ? That in itself is going to cost a lot, to transmit and parse. Never mind the sql injection possibilities. – O. Jones Jun 29 '11 at 11:51

5 Answers5

29

IN is preferable to OR -- OR is a notoriously bad performer, and can cause other issues that would require using parenthesis in complex queries.

Better option than either IN or OR, is to join to a table containing the values you want (or don't want). This table for comparison can be derived, temporary, or already existing in your schema.

OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
  • 1
    No, I just query on only one table. My list may contains too much items, therefore I can't use IN. I tried to split the list into smaller parts and do query on a batch of sublists, but later I have to order the data on memory, it's very slow. – Đinh Hồng Châu Jun 29 '11 at 01:55
  • Aren't IN and OR the same? That is, IN expands out to OR anyway? This is why NOT IN with NULL fails – gbn Jun 29 '11 at 04:11
  • @gbn: Logically, yes. But `IN` is optimized vs using `OR` -- it's more than syntactic sugar. – OMG Ponies Jun 29 '11 at 04:28
  • Can you provide any proof that IN is more than syntactic sugar? I tested and posted results in my answer. – jva Jul 27 '12 at 14:48
  • @jva: It's general policy to downvote *after* someone responds, not *before*. Frankly, this comes off as trying to advertise your answer. Keep in mind that answers do age - newer versions can mean the behavior is not guaranteed. – OMG Ponies Jul 28 '12 at 02:16
7

In this scenario I would do this:

  1. Create a one column global temporary table
  2. Populate this table with your list from the external source (and quickly - another whole discussion)
  3. Do your query by joining the temporary table to the other table (consider dynamic sampling as the temporary table will not have good statistics)

This means you can leave the sort to the database and write a simple query.

WW.
  • 23,793
  • 13
  • 94
  • 121
5

Oracle internally converts IN lists to lists of ORs anyway so there should really be no performance differences. The only difference is that Oracle has to transform INs but has longer strings to parse if you supply ORs yourself.

Here is how you test that.

CREATE TABLE my_test (id NUMBER);

SELECT 1 
FROM my_test
WHERE id IN (1,2,3,4,5,6,7,8,9,10,
             21,22,23,24,25,26,27,28,29,30,
             31,32,33,34,35,36,37,38,39,40,
             41,42,43,44,45,46,47,48,49,50,
             51,52,53,54,55,56,57,58,59,60,
             61,62,63,64,65,66,67,68,69,70,
             71,72,73,74,75,76,77,78,79,80,
             81,82,83,84,85,86,87,88,89,90,
             91,92,93,94,95,96,97,98,99,100
             );

SELECT sql_text, hash_value
FROM v$sql 
WHERE sql_text LIKE '%my_test%';

SELECT operation, options, filter_predicates
FROM v$sql_plan
WHERE hash_value = '1181594990'; -- hash_value from previous query

SELECT STATEMENT
TABLE ACCESS FULL ("ID"=1 OR "ID"=2 OR "ID"=3 OR "ID"=4 OR "ID"=5 OR "ID"=6 OR "ID"=7 OR "ID"=8 OR "ID"=9 OR "ID"=10 OR "ID"=21 OR "ID"=22 OR "ID"=23 OR "ID"=24 OR "ID"=25 OR "ID"=26 OR "ID"=27 OR "ID"=28 OR "ID"=29 OR "ID"=30 OR "ID"=31 OR "ID"=32 OR "ID"=33 OR "ID"=34 OR "ID"=35 OR "ID"=36 OR "ID"=37 OR "ID"=38 OR "ID"=39 OR "ID"=40 OR "ID"=41 OR "ID"=42 OR "ID"=43 OR "ID"=44 OR "ID"=45 OR "ID"=46 OR "ID"=47 OR "ID"=48 OR "ID"=49 OR "ID"=50 OR "ID"=51 OR "ID"=52 OR "ID"=53 OR "ID"=54 OR "ID"=55 OR "ID"=56 OR "ID"=57 OR "ID"=58 OR "ID"=59 OR "ID"=60 OR "ID"=61 OR "ID"=62 OR "ID"=63 OR "ID"=64 OR "ID"=65 OR "ID"=66 OR "ID"=67 OR "ID"=68 OR "ID"=69 OR "ID"=70 OR "ID"=71 OR "ID"=72 OR "ID"=73 OR "ID"=74 OR "ID"=75 OR "ID"=76 OR "ID"=77 OR "ID"=78 OR "ID"=79 OR "ID"=80 OR "ID"=81 OR "ID"=82 OR "ID"=83 OR "ID"=84 OR "ID"=85 OR "ID"=86 OR "ID"=87 OR "ID"=88 OR "ID"=89 OR "ID"=90 OR "ID"=91 OR "ID"=92 OR "ID"=93 OR "ID"=94 OR "ID"=95 OR "ID"=96 OR "ID"=97 OR "ID"=98 OR "ID"=99 OR "ID"=100)

jva
  • 2,797
  • 1
  • 26
  • 41
2

If you create the table with a primary key:

CREATE TABLE my_test (id NUMBER,
CONSTRAINT PK PRIMARY KEY (id));

and go through the same SELECTs to run the query with the multiple IN values, followed by retrieving the execution plan via hash value, what you get is:

SELECT STATEMENT
INLIST ITERATOR
INDEX                  RANGE SCAN

This seems to imply that when you have an IN list and are using this with a PK column, Oracle keeps the list internally as an "INLIST" because it is more efficient to process this, rather than converting it to ORs as in the case of an un-indexed table.

I was using Oracle 10gR2 above.

aizaz
  • 3,056
  • 9
  • 25
  • 57
TrevorH
  • 21
  • 1
2

I would question the whole approach. The client of the SP has to send 100000 IDs. Where does the client get those IDs from? Sending such a large number of ID as the parameter of the proc is going to cost significantly anyway.

oazabir
  • 1,599
  • 9
  • 15