I am having a hard time understanding why the Oracle CBO is behaving the way it does when a bind variable is part of a OR condition.
My environment
Oracle 12.2 over Red Hat Linux 7
HINT. I am just providing a simplification of the query where the problem is located
$ sqlplus / as sysdba
SQL*Plus: Release 12.2.0.1.0 Production on Thu Jun 10 15:40:07 2021
Copyright (c) 1982, 2016, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> @test.sql
SQL> var loanIds varchar2(4000);
SQL> exec :loanIds := '100000018330,100000031448,100000013477,100000023115,100000022550,100000183669,100000247514,100000048198,100000268289';
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.00
SQL> SELECT
2 whs.* ,
3 count(*) over () AS TOTAL
4 FROM ALFAMVS.WHS_LOANS whs
5 WHERE
6 ( nvl(:loanIds,'XX') = 'XX' or
7 loanid IN (select regexp_substr(NVL(:loanIds,''),'[^,]+', 1, level) from dual
8 connect by level <= regexp_count(:loanIds,'[^,]+'))
9 )
10 ;
7 rows selected.
Elapsed: 00:00:18.72
Execution Plan
----------------------------------------------------------
Plan hash value: 2980809427
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6729 | 6748K| 2621 (1)| 00:00:01 |
| 1 | WINDOW BUFFER | | 6729 | 6748K| 2621 (1)| 00:00:01 |
|* 2 | FILTER | | | | | |
| 3 | TABLE ACCESS FULL | WHS_LOANS | 113K| 110M| 2621 (1)| 00:00:01 |
|* 4 | FILTER | | | | | |
|* 5 | CONNECT BY WITHOUT FILTERING (UNIQUE)| | | | | |
| 6 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NVL(:LOANIDS,'XX')='XX' OR EXISTS (SELECT 0 FROM "DUAL" "DUAL" WHERE
SYS_OP_C2C( REGEXP_SUBSTR (NVL(:LOANIDS,''),'[^,]+',1,LEVEL))=:B1 CONNECT BY LEVEL<=
REGEXP_COUNT (:LOANIDS,'[^,]+')))
4 - filter(SYS_OP_C2C( REGEXP_SUBSTR (NVL(:LOANIDS,''),'[^,]+',1,LEVEL))=:B1)
5 - filter(LEVEL<= REGEXP_COUNT (:LOANIDS,'[^,]+'))
Statistics
----------------------------------------------------------
288 recursive calls
630 db block gets
9913 consistent gets
1 physical reads
118724 redo size
13564 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
113003 sorts (memory)
0 sorts (disk)
7 rows processed
SQL> set autotrace off
SQL> select count(*) from ALFAMVS.WHS_LOANS ;
COUNT(*)
----------
113095
1 row selected.
Elapsed: 00:00:00.14
KEY POINTS
- I do know that if I change the OR expression by using two selects and UNION ALL works perfectly. The problem is that I have a lot of conditions done in the same way, so UNION ALL is not a solution in my case.
- The table has statistics up to date calculated with FOR ALL COLUMNS SIZE AUTO and with ESTIMATE PERCENT 10%.
- Dynamic SQL is not a solution in my case, because the query is called through a third party software that uses an API Web to convert the result to JSON.
- I was able to rephrase the regular expression with connect by level in a way that now takes 19 seconds. Before it was taking 40 seconds.
- The table has only 113K records and no indexes.
- The query has 20 conditions of this kind, all written in the same way, as the screen in the web app that triggers the query by the API allows the user to use any combination of parameters or none at all.
If I remove the expression NVL(:loanIds,'XX') = 'XX' OR
, the query takes 0.01 seconds. Why this OR expression with BINDs is giving such headache to the Optimizer ?
-- UPDATE --
I want to thank @Alex Poole for his suggestions and share with him that the third alternative ( removing the regular expressions ) has worked as a charm. It would be great to understand why, though. You have my most sincere gratitude. I used those for a while and I never had this problem. Also, the suggestion to use regexp_like
was even better than the original one with regexp_substr
and connect by level
, but much slower by far than the one where no regular expressions are used at all
Original query
7 rows selected.
Elapsed: 00:00:36.29
New query
7 rows selected.
Elapsed: 00:00:00.58
Once the EXISTS
disappeared of the internal predicate, the query works as fast as hell.
Thank you all for your comments !