2

There is a query i would like to improve the performance. The tables have close to 10 million records. The exisiting indexes of both the tables are given below.

DISB_A TABLE INDEX:

INDEX_OWNER  INDEX_NAME     TABLE_OWNER   TABLE_NAME  COLUMN_NAME COLUMN_POSITION  COLUMN_LENGTH  CHAR_LENGTH   DESCEND  COLLATED_COLUMN_ID
-------------------------------------------------------------------------------------------------------------------------------------------
DEVL         DISB_A_INDX1   DEVL          DISB_A      V_ID         1               22             0             ASC 
DEVL         DISB_A_INDX1   DEVL          DISB_A      SEQNBR       2               22             0             ASC 
DEVL         DISB_A_INDX2   DEVL          DISB_A      ND_ID        1               22             0             ASC 
DEVL         DISB_A_INDX2   DEVL          DISB_A      DUE_DATE     2               7              0             ASC 
DEVL         DISB_A_INDX3   DEVL          DISB_A      WL_ID        1               22             0             ASC 
DEVL         DISB_A_INDX4   DEVL          DISB_A      CODE         1               8              8             ASC 
DEVL         DISB_A_PRIME   DEVL          DISB_A      ID           1               22             0             ASC 

DISB_B TABLE INDEX:

INDEX_OWNER   INDEX_NAME  TABLE_OWNER TABLE_NAME    COLUMN_NAME    COLUMN_POSITION  COLUMN_LENGTH   CHAR_LENGTH DESCEND COLLATED_COLUMN_ID
-------------------------------------------------------------------------------------------------------------------------------------
DEVL         DISB_B_INDX1   DEVL       DISB_B        DSBA_ID        1                  22                0        ASC   
DEVL         DISB_B_INDX1   DEVL       DISB_B        SEQNBR         2                  22                0        ASC   
DEVL         DISB_B_PRIME   DEVL       DISB_B        DSBA_ID        1                  22                0        ASC   
DEVL         DISB_B_PRIME   DEVL       DISB_B        ND_ID          2                  22                0        ASC   
DEVL         DISB_B_PRIME   DEVL       DISB_B        A_ID           3                  13                13       ASC   
DEVL         DISB_B_PRIME   DEVL       DISB_B        DIO_ID         4                  6                 6        ASC   
DEVL         DISB_B_PRIME   DEVL       DISB_B        AIO_QUAL       5                  22                0        ASC   
DEVL         DISB_B_PRIME   DEVL       DISB_B        DMT_CODE       6                  3                 3        ASC   
DEVL         DISB_B_PRIME   DEVL       DISB_B        DMT_SEQNBR     7                  22                0        ASC   
DEVL         DISB_B_PRIME   DEVL       DISB_B        ORDER_SEQNBR   8                  22                0        ASC   
DEVL         DISB_B_PRIME   DEVL       DISB_B        NBR            9                  20                20       ASC   

With the exsisting index when i checked the explain plan the cost was very high and the record used to take close to 1 second to 2 second to get displayed and the operation used to be FULL Table scan. Please find the details below.

SQL> explain plan 
     for 
SELECT SUM ( NVL ( dd.req_amt , 0 ) )
FROM DISB_A db ,
     DISB_B dd
WHERE db.id = dd.dsba_id
AND dd.nd_id = xxxxxxxx
AND dd.a_id = 'xx-xx'
AND DECODE (db.v_id , xxxxxxxxx , 'COMPLETE' , db.code ) = 'COMPLETE'
AND db.effdate BETWEEN TRUNC ( SYSDATE , 'YEAR' ) AND SYSDATE; 
 Explained.

 Elapsed: 00:00:00.04
 SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
Plan hash value: 4272128008
--------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                 |     1 |    58 | 62271   (1)| 00:00:03 |
|   1 |  SORT AGGREGATE                |                 |     1 |    58 |            |          |
|*  2 |   FILTER                       |                 |       |       |            |          |
|   3 |    NESTED LOOPS                |                 |     1 |    58 | 62271   (1)| 00:00:03 |
|   4 |     NESTED LOOPS               |                 |    10 |    58 | 62271   (1)| 00:00:03 |
|*  5 |      TABLE ACCESS STORAGE FULL | DISB_A          |    10 |   300 | 62231   (1)| 00:00:03 |
|*  6 |      INDEX RANGE SCAN          | DISB_B_PRIME    |     1 |       |     3   (0)| 00:00:01 |
|   7 |     TABLE ACCESS BY INDEX ROWID| DISB_B          |     1 |    28 |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(SYSDATE@!>=TRUNC(SYSDATE@!,'fmyear'))
   5 - storage(DECODE("DB"."EV_ID",1263421688,'COMPLETE',"DB"."STATUS_CODE")='COMPLETE'
              AND "DB"."EFFDATE">=TRUNC(SYSDATE@!,'fmyear') AND "DB"."EFFDATE"<=SYSDATE@!)
       filter(DECODE("DB"."EV_ID",1263421688,'COMPLETE',"DB"."STATUS_CODE")='COMPLETE'
              AND "DB"."EFFDATE">=TRUNC(SYSDATE@!,'fmyear') AND "DB"."EFFDATE"<=SYSDATE@!)
   6 - access("DB"."ID"="DD"."DSBA_ID" AND "DD"."IND_ID"=20972265 AND
              "DD"."GA_ID"='150563-01')

25 rows selected.

Elapsed: 00:00:00.02

SQL> SELECT SUM ( NVL ( dd.req_amt , 0 ) )
FROM DISB_A db ,
     DISB_B dd
WHERE db.id = dd.dsba_id
AND dd.nd_id = xxxxxxxx
AND dd.a_id = 'xx-xx'
AND DECODE (db.v_id , xxxxxxxxx , 'COMPLETE' , db.code ) = 'COMPLETE'
AND db.effdate BETWEEN TRUNC ( SYSDATE , 'YEAR' ) AND SYSDATE; 

 SUM(NVL(DD.REQ_AMT,0))
 ______________________
             62500

  **1 row selected.
  Elapsed: 00:00:01.65**

So in the previous post I was asked to add another index so that it will reduce the cost and the query might come out faster. Please find the index that i added below.

 **CREATE INDEX DISB_A_INDX5 ON DISB_A (ID,IND_id,EV_ID,status_code,EFFDATE ASC) NOPARALLEL;**

 PLAN_TABLE_OUTPUT
 Plan hash value: 2535999045

-----------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                  |     1 |    58 | 18669   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE                  |                  |     1 |    58 |            |          |
|*  2 |   FILTER                         |                  |       |       |            |          |
|   3 |    NESTED LOOPS                  |                  |     1 |    58 | 18669   (1)| 00:00:01 |
|   4 |     NESTED LOOPS                 |                  |    35 |    58 | 18669   (1)| 00:00:01 |
|*  5 |      INDEX STORAGE FAST FULL SCAN| DISB_A_INDX5     |    35 |  1050 | 18598   (1)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN            | DISB_B_PRIME     |     1 |       |     3   (0)| 00:00:01 |
|   7 |     TABLE ACCESS BY INDEX ROWID  | DISB_B           |     1 |    28 |     4   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(SYSDATE@!>=TRUNC(SYSDATE@!,'fmyear'))
   5 - storage(DECODE("DB"."EV_ID",1263421688,'COMPLETE',"DB"."STATUS_CODE")='COMPLETE' AND
              "DB"."EFFDATE"<=SYSDATE@! AND "DB"."EFFDATE">=TRUNC(SYSDATE@!,'fmyear'))
       filter(DECODE("DB"."EV_ID",1263421688,'COMPLETE',"DB"."STATUS_CODE")='COMPLETE' AND
              "DB"."EFFDATE"<=SYSDATE@! AND "DB"."EFFDATE">=TRUNC(SYSDATE@!,'fmyear'))
   6 - access("DB"."ID"="DD"."DSBA_ID" AND "DD"."IND_ID"=20972265 AND
              "DD"."GA_ID"='150563-01')

25 rows selected.

Elapsed: 00:00:00.03


SQL> SELECT SUM ( NVL ( dd.req_amt , 0 ) )
FROM DISB_A db ,
     DISB_B dd
WHERE db.id = dd.dsba_id
AND dd.nd_id = xxxxxxxx
AND dd.a_id = 'xx-xx'
AND DECODE (db.v_id , xxxxxxxxx , 'COMPLETE' , db.code ) = 'COMPLETE'
AND db.effdate BETWEEN TRUNC ( SYSDATE , 'YEAR' ) AND SYSDATE; 

SUM(NVL(DD.REQ_AMT,0))
______________________
                 62500

**1 row selected.
Elapsed: 00:00:05.45**

Question :

  1. When the cost reduced actually the query should have come faster. But it took 6 seconds for the query to come with the result. Can you please explain if i am doing something wrong/is there any other way to improve the performance.

  2. The number of rows picked during FULL TABLE SCAN before adding the additional index was 10 rows but after adding the additional index the number of rows jumped from 10 to 35.

Vimal Bhaskar
  • 748
  • 1
  • 5
  • 17
  • You asked essentially the same question earlier (https://stackoverflow.com/questions/65580294/performance-tuning-a-decode-statement-in-a-where-clause) and the answer recommended two indexes, neither of which is on this table. I would suggest that you delete this question until you try the answer in that question. – Gordon Linoff Jan 06 '21 at 11:55
  • You can see that i have added that index in the question itself. I have also included the create Index statement as well. – Vimal Bhaskar Jan 06 '21 at 11:57
  • . . I see no index where the first two columns are `nd_id` and `a_id`. And similarly for the second recommended index. – Gordon Linoff Jan 06 '21 at 11:59
  • Dont you think I need not add the Index for the disb_b table because the cost / index utilization is correct there. The issue according to me (I might be wrong) is with the disb_A table. So shouldnt be focus on the index in table A than B ? – Vimal Bhaskar Jan 06 '21 at 12:04
  • Cost is an estimate using your query, statistics and optimizer math. It sometimes will end up non-representative of reality. What does `xxxxxxxxx` mean? Judging by your plan predicates you are putting in a literal value here, making your decode impossible to index - you should think about rewriting this (the decode will also prove difficult for the optimizer to estimate with). If the cardinality estimates are sane (and only a small amount of rows in table `DISB_A` matches your predicates) then you would get huge improvements to the performance by rewriting so an index could be used properly. – Andrew Sayer Jan 06 '21 at 13:07
  • Hi @AndrewSayer Thanks for the reply. But the xxxxx denotes the Bind variable. I cannot use the bind variable when i create a index. Can you please provide an example ? Would be great if you could help me. – Vimal Bhaskar Jan 06 '21 at 13:26
  • Well a correct rewrite for `DECODE (db.v_id , :xxxxxxxxx , 'COMPLETE' , db.code ) = 'COMPLETE'` would be: `(db.v_id = :xxxxxxxxx or (db.v_id is null and :xxxxxxxxx is null) or db.code = 'COMPLETE' )` which would benefit from `or expansion` with indexes on `v_id` and `code`, if you combine those with the other columns that are useful in reducing the selectivity against `DISB_A` you'll probably get a very fast execution time. – Andrew Sayer Jan 06 '21 at 13:45
  • @AndrewSayer i did try that but did not see a the improvments in the execution plan. But do you think it might improve ? – Vimal Bhaskar Jan 06 '21 at 13:48
  • It's hard to comment without seeing the execution plan it caused. Was there or expansion? Did you create indexes that include the columns responsible for the main selectivity? You haven't confirmed that the low cardinality estimates are correct, are they? – Andrew Sayer Jan 06 '21 at 14:14
  • Sorry but what does this mean "You haven't confirmed that the low cardinality estimates are correct, are they?" ? Sorry basically i am new to plsql and learning. – Vimal Bhaskar Jan 06 '21 at 14:17
  • The optimizer is expecting to find 10 rows in `DISB_A` in the first query and 35 in the second query based on your predicates. Does that sound about right? – Andrew Sayer Jan 06 '21 at 14:21

1 Answers1

0

Based on the assumption that or expansion probably isn't happening on it's own by just replacing the decode. And assuming that the driving table should be DISB_A (it would be a whole lot easier to drive the query from DISB_B so let us know if the standalone filters against DISB_B are effective at cutting down the number of rows visited). I would rewrite the query to (there may be typos):

select sum(req_amt) 
from (
SELECT SUM ( dd.req_amt  ) req_amt
FROM DISB_A db ,
     DISB_B dd
WHERE db.id = dd.dsba_id
AND dd.nd_id = :xxxxxxxx
AND dd.a_id = 'xx-xx'
AND db.effdate BETWEEN TRUNC ( SYSDATE , 'YEAR' ) AND SYSDATE
and db.v_id = :xxxxxxxxx 
union all
SELECT SUM ( dd.req_amt  ) req_amt
FROM DISB_A db ,
     DISB_B dd
WHERE db.id = dd.dsba_id
AND dd.nd_id = :xxxxxxxx
AND dd.a_id = 'xx-xx'
AND db.effdate BETWEEN TRUNC ( SYSDATE , 'YEAR' ) AND SYSDATE
and (db.v_id is null and :xxxxxxxxx is null)
union all
SELECT SUM ( dd.req_amt  ) req_amt
FROM DISB_A db ,
     DISB_B dd
WHERE db.id = dd.dsba_id
AND dd.nd_id = :xxxxxxxx
AND dd.a_id = 'xx-xx'
AND db.effdate BETWEEN TRUNC ( SYSDATE , 'YEAR' ) AND SYSDATE
and  db.code = 'COMPLETE'
and  lnnvl(db.v_id = :xxxxxxxxx )
    )

And create the following indexes:

create index DISB_A_idx_v on DISB_A (v_id, effdate);
create index DISB_A_idx_code on DISB_A (code, effdate);
create index DISB_B_idx on DISB_B (nd_id,a_id,dsba_id);

If this is still relatively slow then it would be helpful to have some feedback about how selective these filters really are. Have a look at how to get row source execution statistics inside the plan (point 4 of https://ctandrewsayer.wordpress.com/2017/03/21/4-easy-lessons-to-enhance-your-performance-diagnostics/) and include that information in your original question.

Andrew Sayer
  • 2,296
  • 1
  • 7
  • 9
  • This would actually duplicate the records in the first place. The logic written above and the decode are not same. Decode would pick either the v_id if it matches or the status_code amd check if it is complete. But we are taking records from both the conditions in the above solution. – Vimal Bhaskar Jan 07 '21 at 06:27
  • I don't think so, the first part will suceed if v_id matches (implicitly requiring both to be not null), the second if v_id is null and the input is null, the third if code is complete and v_id doesn't match the input (or if just one is null). Have you got an example input and data that would lead to doulbe counting? – Andrew Sayer Jan 07 '21 at 11:57