0

Wanted to optimize a query with the minus that it takes too much time ... if they can give thanked help. I have a query that runs against a pretty large table,the query takes forever and presumably does a full table scan.This query is very very slow!

select 
transaction_id as APPL_ID
,cast(reason_desc as VARCHAR2(2000)) as APPL_REJECT_REASON_DESC
,cast(reason_code as VARCHAR2(2000)) as APPL_REJECT_REASON
,cast(user_reject as VARCHAR2(100)) as APPL_REJECT_USER
,cast(step_reject as VARCHAR2(100)) as APPL_REJECT_USER_ROLE
,cast(reason_desc_vn as VARCHAR2(2000)) as APPL_REJECT_REASON_DESC_VN
from(
          select wfi.item_id transaction_id
                ,cq1.name as reason_desc
                ,cq1.response as reason_code
                ,COALESCE(wft.executed_by, wft.recipient_shortname) as user_reject
                ,wft.profile_access_right_sname as step_reject              
                ,row_number() over (partition by wfi.item_id order by wft.start_date desc) stt
                ,cq1.name_1 as reason_desc_vn
               --SELECT 1
        from  STA.STA_ACL_WF_INSTANCE wfi 
        inner join STA.STA_ACL_WF_TASK wft 
            on (wfi.item = 'transaction' and wfi.wf_instance_id = wft.wf_instance_id)           
        inner join (
        SELECT t.task_id
        ,LISTAGG (t.response, ', ') WITHIN GROUP (ORDER BY  t.response) as response
        ,LISTAGG (t11.name, ', ') WITHIN GROUP (ORDER BY  t11.name) as  name
        ,LISTAGG (t11.name_1, ', ') WITHIN GROUP (ORDER BY  t11.name_1) as  name_1
        FROM (
        SELECT task_id,
          CAST(TRIM(regexp_substr(t.RESPONSE, '[^,]+', 1, levels.column_value)) AS VARCHAR2(100)) AS RESPONSE
        FROM sta.sta_acl_wf_task_col_quest t,
          table(cast(multiset(select level from dual connect by  level <= length (
                regexp_replace(t.RESPONSE, '[^,]+'))  + 1) as sys.OdciNumberList)) levels
          WHERE t.question in ('Reject Reason','Cancel Reason','Reject Reason For Recommendation','Reject Reason For Approval')
          AND t.response is not null
          ) t 
         LEFT JOIN STA.STA_ACL_STATIC_DATA_TABLE t11 on (t.response= t11.SHORTNAME)
        WHERE 1=1
        GROUP BY t.task_id
        )cq1 on wft.task_id = cq1.task_id
    ) t8 where stt=1

Explain SQL

PLAN_TABLE_OUTPUT                                                                                                                 |
----------------------------------------------------------------------------------------------------------------------------------+
Plan hash value: 3937279373                                                                                                       |
                                                                                                                                  |
-----------------------------------------------------------------------------------------------------------------------------     |
| Id  | Operation                               | Name                      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |     |
-----------------------------------------------------------------------------------------------------------------------------     |
|   0 | SELECT STATEMENT                        |                           |  9560M|    53T|       |  3339M  (2)| 72:27:52 |     |
|*  1 |  VIEW                                   |                           |  9560M|    53T|       |  3339M  (2)| 72:27:52 |     |
|*  2 |   WINDOW NOSORT                         |                           |  9560M|  8431G|       |  3339M  (2)| 72:27:52 |     |
|   3 |    SORT GROUP BY                        |                           |  9560M|  8431G|  8580G|  3339M  (2)| 72:27:52 |     |
|*  4 |     HASH JOIN RIGHT OUTER               |                           |  9560M|  8431G|  3392K|   650M  (1)| 14:07:31 |     |
|   5 |      TABLE ACCESS FULL                  | STA_ACL_STATIC_DATA_TABLE | 19812 |  3153K|       |   399   (3)| 00:00:01 |     |
|   6 |      NESTED LOOPS                       |                           |  7985M|  5830G|       |    40M  (4)| 00:52:07 |     |
|*  7 |       HASH JOIN                         |                           |   488K|   364M|   446M|   958K  (3)| 00:01:15 |     |
|*  8 |        TABLE ACCESS FULL                | STA_ACL_WF_INSTANCE       |  1592K|   428M|       | 77403   (4)| 00:00:07 |     |
|*  9 |        HASH JOIN                        |                           |   986K|   470M|    57M|   787K  (3)| 00:01:02 |     |
|* 10 |         TABLE ACCESS FULL               | STA_ACL_WF_TASK_COL_QUEST |   986K|    46M|       |   396K  (5)| 00:00:32 |     |
|  11 |         TABLE ACCESS FULL               | STA_ACL_WF_TASK           |  5496K|  2364M|       |   139K  (3)| 00:00:11 |     |
|  12 |       COLLECTION ITERATOR SUBQUERY FETCH|                           | 16360 | 32720 |       |    80   (4)| 00:00:01 |     |
|* 13 |        CONNECT BY WITHOUT FILTERING     |                           |       |       |       |            |          |     |
|  14 |         FAST DUAL                       |                           |     1 |       |       |     3   (0)| 00:00:01 |     |
-----------------------------------------------------------------------------------------------------------------------------     |
                                                                                                                                  |
Predicate Information (identified by operation id):                                                                               |
---------------------------------------------------                                                                               |
                                                                                                                                  |
   1 - filter("STT"=1)                                                                                                            |
   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "WFI"."ITEM_ID" ORDER BY INTERNAL_FUNCTION("WFT"."START_DATE") DESC                |
              )<=1)                                                                                                               |
   4 - access("T11"."SHORTNAME"(+)=SYS_OP_C2C(CAST(TRIM( REGEXP_SUBSTR ("T"."RESPONSE" /*+ LOB_BY_VALUE */                        |
              ,'[^,]+',1,VALUE(KOKBF$))) AS VARCHAR2(100))))                                                                      |
   7 - access("WFI"."WF_INSTANCE_ID"="WFT"."WF_INSTANCE_ID")                                                                      |
   8 - filter("WFI"."ITEM"=U'transaction')                                                                                        |
   9 - access("WFT"."TASK_ID"="TASK_ID")                                                                                          |
  10 - filter(("T"."QUESTION"=U'Cancel Reason' OR "T"."QUESTION"=U'Reject Reason' OR "T"."QUESTION"=U'Reject Reason               |
              For Approval' OR "T"."QUESTION"=U'Reject Reason For Recommendation') AND "T"."RESPONSE" /*+ LOB_BY_VALUE */  IS NOT |
              NULL)                                                                                                               |
  13 - filter(LEVEL<=LENGTH( REGEXP_REPLACE (:B1,'[^,]+'))+1)                                                                     |

Would you please guide me on how I can change the query to get better performance? Any help would be appreciated.

Jon Heller
  • 34,999
  • 6
  • 74
  • 132
john224
  • 17
  • 3

2 Answers2

1

Regular expressions are slow and hierarchical queries may be slower than recursive queries.

You can split strings using simple string functions (which is more to type but more efficient):

WITH responses (task_id, response, spos, epos) AS (
  SELECT task_id,
         response,
         1,
         INSTR(response, ',', 1)
  FROM   sta.sta_acl_wf_task_col_quest
  WHERE  question in (
           'Reject Reason',
           'Cancel Reason',
           'Reject Reason For Recommendation',
           'Reject Reason For Approval'
         )
  AND    response is not null
UNION ALL
  SELECT task_id,
         response,
         epos + 1,
         INSTR(response, ',', epos + 1)
  FROM   responses
  WHERE  epos > 0
)
SELECT task_id,
       CASE epos
       WHEN 0
       THEN SUBSTR(response, spos)
       ELSE SUBSTR(response, spos, epos - spos)
       END AS response
FROM   responses

However, it would be even better if you did not store values using comma-separated variables and instead use separate rows of a table.

MT0
  • 143,790
  • 11
  • 59
  • 117
0

Try removing the unnecessary TABLE(CAST(MULTISET. The simpler expression is more likely to lead to a better cardinality estimate, which is more likely to lead to a better execution plan.

For example, below is the explain plan for the original subquery. (Although I had to add literals to make the code work on my system.) The query returns 4 rows but Oracle thinks it will return 8168. This is because Oracle just gives up on trying to estimate unpredictable table functions and returns a number close to the block size. This might explain your explain plan's 16360 - I'm guessing your system has a 16K block size?

explain plan for
select * from
table(cast(multiset(select level from dual connect by  level <= length (
regexp_replace(/*It.RESPONSE*/ 'A,B,C,D', '[^,]+'))  + 1) as sys.OdciNumberList)) levels
WHERE /*t.question*/ 'Cancel Reason' in ('Reject Reason','Cancel Reason','Reject Reason For Recommendation','Reject Reason For Approval');

select * from table(dbms_xplan.display);


Plan hash value: 3985296316
 
-------------------------------------------------------------------------------------------
| Id  | Operation                          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |      |  8168 | 16336 |    29   (0)| 00:00:01 |
|   1 |  COLLECTION ITERATOR SUBQUERY FETCH|      |  8168 | 16336 |    29   (0)| 00:00:01 |
|*  2 |   CONNECT BY WITHOUT FILTERING     |      |       |       |            |          |
|   3 |    FAST DUAL                       |      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter(LEVEL<=2)

Removing the TABLE(CAST(MULTISET helps Oracle make a much better estimate of 1 row, which is pretty close to the actual 4. (But if you use MTO's suggestion to avoid storing comma-separated lists, you could simplify your code and further improve cardinality estimates.)

explain plan for
select * from
(select level from dual connect by  level <= length (
regexp_replace(/*It.RESPONSE*/ 'A,B,C,D', '[^,]+')) + 1) levels
WHERE /*t.question*/ 'Cancel Reason' in ('Reject Reason','Cancel Reason','Reject Reason For Recommendation','Reject Reason For Approval');

select * from table(dbms_xplan.display);


Plan hash value: 2403765415
 
--------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |     1 |    13 |     2   (0)| 00:00:01 |
|   1 |  VIEW                         |      |     1 |    13 |     2   (0)| 00:00:01 |
|*  2 |   CONNECT BY WITHOUT FILTERING|      |       |       |            |          |
|   3 |    FAST DUAL                  |      |     1 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter(LEVEL<=4)

But this is just a guess! We're dealing with explain plan guesses instead of execution plan actual values. Simplifying queries to get closer to actual values is almost always helpful, but sometimes it can paradoxically make things worse.

If you really want to optimize your query, try these steps and modify your post to get additional feedback:

  1. How long does the query take? How long do you expect it to take?
  2. How large are your tables (using DBA_SEGMENTS.BYTES), and what percentage of rows are returned from each table? Don't worry too much about full table scans. Indexes are better for retrieving a small percentage of rows from a table, but full table scans are better for retrieving a large percentage of rows from a table. If part of your query must inevitably return 50% of a large table, then the limiting factor will be how fast can your hardware read the X gigabytes of data for that table.
  3. Get actual numbers instead of guesses. Find the SQL_ID of your query by querying GV$SQL, and then run select dbms_sqltune.report_sql_monitor('<sql_id>') from dual;. The results will give you the actual amount of time for each operation and the actual number of rows. It will tell you which operations to worry about, and give you clues to why Oracle chose a bad plan. Explain plans are a decent start, but they are pathetic compared to the actual values. This step may take you hours to understand.
Jon Heller
  • 34,999
  • 6
  • 74
  • 132