5

I have a table (50M rows) which has indexes on column_a and column_b

when I do select count(*) from table where column_a in (list_a), I get in no time my results.

Same with select count(*) from table where column_b in (list_b).

But when I do

select count(*) from table where column_a in (list_a) or column_b in (list_b)

My queries become insanely slow and last half hour before outputting the right number... Am I doing something wrong? How can I optimize the actual behavior of this query?

Thanks!

Plan for query 1:

Plan hash value: 2471097773


-------------------------------------------------------------
| Id  | Operation                    | Name                 |
-------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      |
|   1 |  SORT AGGREGATE              |                      |
|   2 |   NESTED LOOPS               |                      |
|   3 |    SORT UNIQUE               |                      |
|   4 |     TABLE ACCESS FULL        | LIST_A               |
|   5 |    BITMAP CONVERSION COUNT   |                      |
|   6 |     BITMAP INDEX SINGLE VALUE| MY_TABLE_IX02        |
-------------------------------------------------------------

Plan for query 2

Plan hash value: 1870911518

-------------------------------------------------------------
| Id  | Operation                    | Name                 |
-------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      |
|   1 |  SORT AGGREGATE              |                      |
|   2 |   NESTED LOOPS               |                      |
|   3 |    SORT UNIQUE               |                      |
|   4 |     TABLE ACCESS FULL        | LIST_B               |
|   5 |    BITMAP CONVERSION COUNT   |                      |
|   6 |     BITMAP INDEX SINGLE VALUE| MY_TABLE_IX05        |
-------------------------------------------------------------

Plan for query 3:

Plan hash value: 1821967683

----------------------------------------------------------------
| Id  | Operation                       | Name                 |
----------------------------------------------------------------
|   0 | SELECT STATEMENT                |                      |
|   1 |  SORT AGGREGATE                 |                      |
|   2 |   FILTER                        |                      |
|   3 |    VIEW                         | index$_join$_001     |
|   4 |     HASH JOIN                   |                      |
|   5 |      BITMAP CONVERSION TO ROWIDS|                      |
|   6 |       BITMAP INDEX FULL SCAN    | MY_TABLE_IX02        |
|   7 |      BITMAP CONVERSION TO ROWIDS|                      |
|   8 |       BITMAP INDEX FULL SCAN    | MY_TABLE_IX05        |
|   9 |    TABLE ACCESS FULL            | LIST_A               |
|  10 |    TABLE ACCESS FULL            | LIST_B               |
----------------------------------------------------------------
Stephane Maarek
  • 5,202
  • 9
  • 46
  • 87
  • What is `table`? What is `column_a` and `list_a`? Which indexes do you have on `list_a`? Etc. – Patrick Hofman Feb 26 '15 at 12:48
  • @PatrickHofman, column_a and b are varchar, list_a is 100 values from a select distinct on another table. list_a, list_b have binaries indexes, same as column_a, column_b – Stephane Maarek Feb 26 '15 at 12:48
  • Note that it's different counts, if both a and b is true... – jarlh Feb 26 '15 at 12:49
  • @jarlh - yes, I know that. It's made on purpose, so I don't get duplicates if a row verifies both conditions. The problem lies in the fact that the third query is insanely slow – Stephane Maarek Feb 26 '15 at 12:50
  • 1
    Have you tried `EXPLAIN PLAN`? Does Oracle use your indices? Have you tried adding a combined index on column_a, column_b? – Frank Schmitt Feb 26 '15 at 12:52
  • try to create a compound index on column_a and column_b. – ibre5041 Feb 26 '15 at 12:55
  • @FrankSchmitt - I just edited the question with the actual explain plan for each queries – Stephane Maarek Feb 26 '15 at 12:58
  • 1
    This looks suspicious - `BITMAP INDEX SINGLE VALUE` vs `BITMAP INDEX FULL SCAN`. Did you use the same `IN` clause for both tests? – Frank Schmitt Feb 26 '15 at 12:59
  • I can tell you from personal experience that OR is something you want to avoid at all cost when working with oracle. How fast is 'count in a' + 'count in b' - 'count in a and in b' in comparison? It's three queries, but it could be a lot faster. – Sam Feb 26 '15 at 13:00
  • [This](http://stackoverflow.com/questions/1013797/is-sql-in-bad-for-performance) discussion looks interesting regarding your topic. – Szymon Roziewski Feb 26 '15 at 13:01
  • @FrankSchmitt same in query. Sam I like the idea - I think it should speed up things for the count. The underlying is that I actually need to retrieve the data. count was to illustrate my point. – Stephane Maarek Feb 26 '15 at 13:03
  • In that case you might try and see if a UNION outperforms the OR. – Sam Feb 26 '15 at 13:05
  • Have you tried forcing Oracle to use the indexes with a query hint? In my experience, Oracle has a VERY bad habit of switching to full table scans. Default Oracle settings generally have the index retrieval vs table scan costs weighted wrong. – Necreaux Feb 26 '15 at 13:17
  • @Necreaux - I haven't tried that -- how do you force it? – Stephane Maarek Feb 26 '15 at 13:18
  • You can do it with query hints on a one-off basis. I generally prefer to mess with the optimizer_index_cost_adj setting in my session first since the syntax is way simpler. It might be controversial to some but this setting is generally inappropriate by default: http://www.dba-oracle.com/oracle_tips_cost_adj.htm – Necreaux Feb 26 '15 at 13:22
  • Something I don't understand is why Oracle wouldn't do the following (don't know much about the technology): for the first OR predicate, use the index and output the ROW_ID in a dictionary. For the second OR predicate, use the index and insert each ROW_ID in the previously created dictionary. Use the dictionary to filter the table on the ROW_IDs to keep and output the result. – Stephane Maarek Feb 26 '15 at 13:31
  • It's all about the cost. It looks at the cost vs. the cost of a full table scan, and decides the full table scan is easier/better. Sometimes it is wrong though. – Necreaux Feb 26 '15 at 13:36

2 Answers2

10

In my experience, OR tends to introduce a negative impact on queries (like ignoring indices and triggering full table scans). Sometimes this isn't so bad, but I have had queries that went from lightening fast to taking minutes because of it.

One possible solution is to change the OR into a UNION or even a UNION ALL. I have had success with this in the past to improve the performance of queries, but you will have to compare them to one another to see whether this will work for you.

You can try out the three options below and see if any one of them offers a significant improvement over the others.

Original query (edited to return rows since you mentioned returning data instead of doing a count):

select * from table where column_a in (list_a) or column_b in (list_b)

Query that avoids the OR:

select * from table where column_a in (list_a)
UNION
select * from table where column_b in (list_b)

And since a UNION triggers a DISTINCT, this might be worth trying out as well:

select * from table where column_a in (list_a) and not column_b in (list_b)
UNION ALL
select * from table where column_b in (list_b) and not column_a in (list_a)
UNION ALL
select * from table where column_a in (list_a) and column_b in (list_b)
Sam
  • 1,358
  • 15
  • 24
1

I guess that Oracle is using the indices for your first two queries and doing a full table scan for the third one. If you think about it, it's logical why Oracle cannot use the indices for the third query:

  • an index on column_a will give Oracle information about the values of column_a
  • an index on column_b will give Oracle information about the values of column_b

But neither index will give Oracle information about the combination of (column_a, column_b). So to speedup your query, you'll need an index that includes both column_a and column_b.

Frank Schmitt
  • 30,195
  • 12
  • 73
  • 107
  • An index with both columns will not help with an OR query. – Necreaux Feb 26 '15 at 13:06
  • Why not? At least `INDEX FAST FULL SCAN` could be used. The question does not specify the length of list_a/list_b but according to exec plan the lists must be very long. – ibre5041 Feb 26 '15 at 13:24
  • @Necreaux, does data distribution matter in the OR case? – jarlh Feb 26 '15 at 13:24
  • The lists are not longer than 200 elements – Stephane Maarek Feb 26 '15 at 13:24
  • 2
    @jarlh Theoretically speaking yes. Practically speaking, I'm not sure, but probably not. Say the query is cola='A' and colb='Z' and there is an index on cola,colb. If A is 90% of the data, then it may make sense to traverse the remaining cola values and look for colb='Z' (although a full table scan probably makes more sense here). Now imagine if it is the reverse and A is 1% of the data. It would be much better to use a separate index on colb. At the end of the day, I don't think the optimizer will adjust for these cases, but I'm not 100% sure. – Necreaux Feb 26 '15 at 13:34