Oracle Compound Join Predicate Causes Row Estimate to be Incorrect

Question

In the example below Oracle's optimizer's estimated rows is incorrect by two orders of magnitude. How do I improve the estimated rows?

Table A has rows with numbers 1 through 1,000 for each of the 10 letters A through J.

Table C has 100 copies of table A.

So, table A has a cardinality of 10K and table C has a cardinality of 1M.

A given single-valued predicate on the number in table A will yield 1/1000 of the rows in table A (same for table C).

A given single-valued predicate on the letter in table A will yield 1/10 of the rows in table A (same for table C).

Setup script.

drop table C;
drop table A;

create table A
  ( num NUMBER
  , val VARCHAR2(3 byte)
  , pad CHAR(40 byte)
  )
;
 
insert /*+ append enable_parallel_dml parallel (auto) */
  into A (num, val, pad)
  select mod(level-1, 1000) +1
       , chr(mod(ceil(level/1000) - 1, 10) + ascii('A'))
       , ' '
    from dual
    connect by level <= 10*1000
;
 
create table C
  ( id NUMBER
  , num NUMBER
  , val VARCHAR2(3 byte)
  , pad CHAR(40 byte)
  )
;
 
insert /*+ append enable_parallel_dml parallel (auto) */
  into C (id, num, val, pad)
  with
    "D1" as
    ( select /*+ materialize */ null from dual connect by level <= 100 --320
    )
    , "D" as
      ( select /*+ materialize */
               level                                           rn
             , mod(level-1, 1000) + 1                          num
             , chr(mod(ceil(level/1000) - 1, 10) + ascii('A')) val
             , ' '                                             pad
          from dual
          connect by level <= 10*1000
          order by 1 offset 0 rows
      )
    select rownum      id
         , num         num
         , val         val
         , pad         pad
      from "D1", "D"
;
commit;
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'A', cascade => true);
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'C', cascade => true);

Consider the explain plan to the following query.

select *
  from A
       join C
         on A.num = C.num
            and A.val = C.val
 where A.num = 1
       and A.val = 'A'
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100 |  9900 |  2209   (1)| 00:00:01 |
|*  1 |  HASH JOIN         |      |   100 |  9900 |  2209   (1)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| A    |     1 |    47 |    23   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| C    |   100 |  5200 |  2185   (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."NUM"="C"."NUM" AND "A"."VAL"="C"."VAL")
   2 - filter("A"."NUM"=1 AND "A"."VAL"='A')
   3 - filter("C"."NUM"=1 AND "C"."VAL"='A')

The row cardinality of each step makes sense to me.

ID=2 --> (1/1,000) * (1/10) * 10,000 = 1

ID=3 --> (1/1,000) * (1/10) * 1,000,000 = 100

ID=1 --> 100 is correct. Predicates in ID=2 and ID=3 are the same, every row from ID=2 will have one and only one match in the row source from ID=3.

Now consider the explain plan to the slightly modified query below.

select *
  from A
       join C
         on A.num = C.num
            and A.val = C.val
 where A.num in(1,2)
       and A.val = 'A'
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     2 |   198 |  2209   (1)| 00:00:01 |
|*  1 |  HASH JOIN         |      |     2 |   198 |  2209   (1)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| A    |     2 |    94 |    23   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| C    |   200 | 10400 |  2185   (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."NUM"="C"."NUM" AND "A"."VAL"="C"."VAL")
   2 - filter("A"."VAL"='A' AND ("A"."NUM"=1 OR "A"."NUM"=2))
   3 - filter("C"."VAL"='A' AND ("C"."NUM"=1 OR "C"."NUM"=2))

The row cardinality of each step ID=2 and ID=3 makes sense to me, but now ID=1 is incorrect by two orders of magnitude.

ID=2 --> (1/1,000)(1/10) * 10,000 = 1

ID=3 --> (1/1,000)(1/10) * 1,000,000 = 100

ID=1 --> The optimizer's estimate is two orders of magnitude different from the actual.

Adding unique and foreign constraints and extended statistics did not improve the estimated row counts.

create unique index IU_A on A (num, val);
alter table A add constraint UK_A unique (num, val) rely using index IU_A enable validate;
alter table C add constraint R_C foreign key (num, val) references A (num, val) rely enable validate;
create index IR_C on C (num, val);
select dbms_stats.create_extended_stats(null,'A','(num, val)') from dual;
select dbms_stats.create_extended_stats(null,'C','(num, val)') from dual;
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'A', cascade => true);
exec dbms_stats.gather_table_stats(OwnName => null, TabName => 'C', cascade => true);

---------------------------------------------------------------------------------------
| Id  | Operation                      | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |      |     2 |   198 |    10   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                  |      |       |       |            |          |
|   2 |   NESTED LOOPS                 |      |     2 |   198 |    10   (0)| 00:00:01 |
|   3 |    INLIST ITERATOR             |      |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID| A    |     2 |    94 |     5   (0)| 00:00:01 |
|*  5 |      INDEX UNIQUE SCAN         | IU_A |     2 |       |     3   (0)| 00:00:01 |
|*  6 |    INDEX RANGE SCAN            | IR_C |     1 |       |     2   (0)| 00:00:01 |
|   7 |   TABLE ACCESS BY INDEX ROWID  | C    |     1 |    52 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(("A"."NUM"=1 OR "A"."NUM"=2) AND "A"."VAL"='A')
   6 - access("A"."NUM"="C"."NUM" AND "C"."VAL"='A')
       filter("C"."NUM"=1 OR "C"."NUM"=2)

What do I need to do to make the estimated rows better match reality?

Using Oracle Enterprise Edition 19c.

Thanks in advance.

Edit After ensuring the most recent optimizer_features_enable was used and modifying one of the predicates, we still have an explain plan whose estimated row count is short by two orders of magnitude.

ID=6 ought to have an estimated rows of 100. It seems it is applying the predicate factor twice. Once for the access and again for the filter.

select /*+ optimizer_features_enable('19.1.0') */
       *
  from A
       join C
         on A.num = C.num
            and A.val = C.val
 where A.num in(1,2)
       and A.val in('A','B')
;
-----------------------------------------------------------------------------------------------
| id  | Operation                              | name | rows  | Bytes | cost (%CPU)| time     |
-----------------------------------------------------------------------------------------------
|   0 | select statement                       |      |     4 |   396 |    16   (0)| 00:00:01 |
|   1 |  nested LOOPS                          |      |     4 |   396 |    16   (0)| 00:00:01 |
|   2 |   nested LOOPS                         |      |     4 |   396 |    16   (0)| 00:00:01 |
|   3 |    INLIST ITERATOR                     |      |       |       |            |          |
|   4 |     table access by index ROWID BATCHED| A    |     4 |   188 |     7   (0)| 00:00:01 |
|*  5 |      index range scan                  | IU_A |     4 |       |     3   (0)| 00:00:01 |
|*  6 |    index range scan                    | IR_C |     1 |       |     2   (0)| 00:00:01 |
|   7 |   table access by index ROWID          | C    |     1 |    52 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------- 
Predicate Information (identified by operation id):
--------------------------------------------------- 
   5 - access("A"."NUM"=1 or "A"."NUM"=2)
       filter("A"."VAL"='A' or "A"."VAL"='B')
   6 - access("A"."NUM"="C"."NUM" and "A"."VAL"="C"."VAL")
       filter(("C"."NUM"=1 or "C"."NUM"=2) and ("C"."VAL"='A' or "C"."VAL"='B'))

I can't reproduce your problem. On my 19c Enterprise Edition database, I get 200 rows for the second explain plan by default. What's your full version, based on `SELECT version_full FROM product_component_version`? (I'm on 19.3.0.0.0 on Windows) Also, try the advanced option for explain plans and see if you see anything "weird" in the Outline Data or other sections: `explain plan for select ...; select * from table(dbms_xplan.display(format => 'advanced'));` — Jon Heller, Nov 08 '21 at 03:23
Thank you. 19.13.0.0.0 on RedHat. The Outline Data showed ``optimizer_features_enabled`` of 11.2.0.2. Changing this to 11.2.0.3 fixed it. So it was some optimizer feature between 11.2.0.2 and 11.2.0.3. However, with only the ``IU_A`` index and using a predicate of ``A.num in(1,2) and A.val in('A','B')`` we still have a incorrect row estimates even when using ``/*+ optimizer_features_enabled('19.1.0') */``. — Alex Bartsmon, Nov 08 '21 at 14:24

Oracle Compound Join Predicate Causes Row Estimate to be Incorrect

0 Answers0