In the example below Oracle's optimizer's estimated rows is incorrect by two orders of magnitude. How do I improve the estimated rows?
Table A
has rows with numbers 1 through 1,000 for each of the 10 letters A through J.
Table C
has 100 copies of table A
.
So, table A
has a cardinality of 10K and table C has a cardinality of 1M.
A given single-valued predicate on the number in table A
will yield 1/1000 of the rows in table A
(same for table C
).
A given single-valued predicate on the letter in table A
will yield 1/10 of the rows in table A
(same for table C
).
Setup script.
drop table C;
drop table A;
create table A
( num NUMBER
, val VARCHAR2(3 byte)
, pad CHAR(40 byte)
)
;
insert /*+ append enable_parallel_dml parallel (auto) */
into A (num, val, pad)
select mod(level-1, 1000) +1
, chr(mod(ceil(level/1000) - 1, 10) + ascii('A'))
, ' '
from dual
connect by level <= 10*1000
;
create table C
( id NUMBER
, num NUMBER
, val VARCHAR2(3 byte)
, pad CHAR(40 byte)
)
;
insert /*+ append enable_parallel_dml parallel (auto) */
into C (id, num, val, pad)
with
"D1" as
( select /*+ materialize */ null from dual connect by level <= 100 --320
)
, "D" as
( select /*+ materialize */
level rn
, mod(level-1, 1000) + 1 num
, chr(mod(ceil(level/1000) - 1, 10) + ascii('A')) val
, ' ' pad
from dual
connect by level <= 10*1000
order by 1 offset 0 rows
)
select rownum id
, num num
, val val
, pad pad
from "D1", "D"
;
commit;
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'A', cascade => true);
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'C', cascade => true);
Consider the explain plan to the following query.
select *
from A
join C
on A.num = C.num
and A.val = C.val
where A.num = 1
and A.val = 'A'
;
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 9900 | 2209 (1)| 00:00:01 |
|* 1 | HASH JOIN | | 100 | 9900 | 2209 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| A | 1 | 47 | 23 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| C | 100 | 5200 | 2185 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."NUM"="C"."NUM" AND "A"."VAL"="C"."VAL")
2 - filter("A"."NUM"=1 AND "A"."VAL"='A')
3 - filter("C"."NUM"=1 AND "C"."VAL"='A')
The row cardinality of each step makes sense to me.
ID=2 --> (1/1,000) * (1/10) * 10,000 = 1
ID=3 --> (1/1,000) * (1/10) * 1,000,000 = 100
ID=1 --> 100 is correct. Predicates in ID=2 and ID=3 are the same, every row from ID=2 will have one and only one match in the row source from ID=3.
Now consider the explain plan to the slightly modified query below.
select *
from A
join C
on A.num = C.num
and A.val = C.val
where A.num in(1,2)
and A.val = 'A'
;
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 198 | 2209 (1)| 00:00:01 |
|* 1 | HASH JOIN | | 2 | 198 | 2209 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| A | 2 | 94 | 23 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| C | 200 | 10400 | 2185 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."NUM"="C"."NUM" AND "A"."VAL"="C"."VAL")
2 - filter("A"."VAL"='A' AND ("A"."NUM"=1 OR "A"."NUM"=2))
3 - filter("C"."VAL"='A' AND ("C"."NUM"=1 OR "C"."NUM"=2))
The row cardinality of each step ID=2 and ID=3 makes sense to me, but now ID=1 is incorrect by two orders of magnitude.
ID=2 --> (1/1,000)(1/10) * 10,000 = 1
ID=3 --> (1/1,000)(1/10) * 1,000,000 = 100
ID=1 --> The optimizer's estimate is two orders of magnitude different from the actual.
Adding unique and foreign constraints and extended statistics did not improve the estimated row counts.
create unique index IU_A on A (num, val);
alter table A add constraint UK_A unique (num, val) rely using index IU_A enable validate;
alter table C add constraint R_C foreign key (num, val) references A (num, val) rely enable validate;
create index IR_C on C (num, val);
select dbms_stats.create_extended_stats(null,'A','(num, val)') from dual;
select dbms_stats.create_extended_stats(null,'C','(num, val)') from dual;
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'A', cascade => true);
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'C', cascade => true);
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 198 | 10 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 2 | 198 | 10 (0)| 00:00:01 |
| 3 | INLIST ITERATOR | | | | | |
| 4 | TABLE ACCESS BY INDEX ROWID| A | 2 | 94 | 5 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | IU_A | 2 | | 3 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IR_C | 1 | | 2 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID | C | 1 | 52 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access(("A"."NUM"=1 OR "A"."NUM"=2) AND "A"."VAL"='A')
6 - access("A"."NUM"="C"."NUM" AND "C"."VAL"='A')
filter("C"."NUM"=1 OR "C"."NUM"=2)
What do I need to do to make the estimated rows better match reality?
Using Oracle Enterprise Edition 19c.
Thanks in advance.
Edit After ensuring the most recent optimizer_features_enable was used and modifying one of the predicates, we still have an explain plan whose estimated row count is short by two orders of magnitude.
ID=6 ought to have an estimated rows of 100. It seems it is applying the predicate factor twice. Once for the access and again for the filter.
select /*+ optimizer_features_enable('19.1.0') */
*
from A
join C
on A.num = C.num
and A.val = C.val
where A.num in(1,2)
and A.val in('A','B')
;
-----------------------------------------------------------------------------------------------
| id | Operation | name | rows | Bytes | cost (%CPU)| time |
-----------------------------------------------------------------------------------------------
| 0 | select statement | | 4 | 396 | 16 (0)| 00:00:01 |
| 1 | nested LOOPS | | 4 | 396 | 16 (0)| 00:00:01 |
| 2 | nested LOOPS | | 4 | 396 | 16 (0)| 00:00:01 |
| 3 | INLIST ITERATOR | | | | | |
| 4 | table access by index ROWID BATCHED| A | 4 | 188 | 7 (0)| 00:00:01 |
|* 5 | index range scan | IU_A | 4 | | 3 (0)| 00:00:01 |
|* 6 | index range scan | IR_C | 1 | | 2 (0)| 00:00:01 |
| 7 | table access by index ROWID | C | 1 | 52 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("A"."NUM"=1 or "A"."NUM"=2)
filter("A"."VAL"='A' or "A"."VAL"='B')
6 - access("A"."NUM"="C"."NUM" and "A"."VAL"="C"."VAL")
filter(("C"."NUM"=1 or "C"."NUM"=2) and ("C"."VAL"='A' or "C"."VAL"='B'))