0

For obvious performance reasons, I would like to rewrite an existing Oracle SQL query that includes correlated subqueries involving "not in" clauses. Can this be accomplished through outer joins or perhaps some other technique?

Here is the code:

SELECT TRIM(et.event_id), TRIM(et.cancel_evt_id)
FROM external_transactions et
JOIN transaction_type tt
    ON et.transaction_type_id = tt.transaction_type_id
WHERE et.acct = 'ABCDEF' 
AND tt.transaction_type_class != 'XYZXYZ'
AND 
(
    TRIM(et.event_id) NOT IN
        (
        SELECT TRIM(t1.transaction_evt_id)
        FROM transactions t1
        WHERE t1.acct = et.acct
        AND t1.asset_id = et.asset_id
        AND t1.cancel_flag = 'N' 
        )
    OR TRIM(et.cancel_evt_id) NOT IN
        (
        SELECT TRIM(t2.cancel_evt_id)
        FROM transactions t2
        WHERE t2.acct = et.acct
        AND t2.asset_id = et.asset_id
        AND t2.cancel_flag = 'Y'
        )
)
;
ptc3
  • 3
  • 2
  • 2
    What are the "obvious reasons"? I suspect that the use of `trim()` on the ids is making the query less efficient than it otherwise would be. – Gordon Linoff Dec 17 '13 at 15:32
  • Post your execution plan. – Yaroslav Shabalin Dec 17 '13 at 15:39
  • doing left-joins and applying WHERE IS NULL of a column is typically MUCH FASTER. However, that being said... what is the actual data type of "et.event_id" and "t1.transaction_evt_id". If numeric, doing TRIM() is bad as also stated by Gordon. – DRapp Dec 17 '13 at 15:41
  • On et, columns are varchar2(55). On t, the columns are varchar2(255). The schemas were likely created at different times. For a quick win, I will experiment with removing the trim() statements to see if they are logically necessary. But I suspect that I will need to restructure the query more significantly. Table/field names have been renamed to protect my client's data, so I will not be able to post the explain plan. – ptc3 Dec 17 '13 at 15:59
  • @DRapp I have not experienced performance problems specifically related to correlated subqueries. The Oracle CBO may elect to rewrite correlate subqueries as joins under the hood, if it feels that generates a better performance plan. See this [Ask Tom](http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:3167884300346662300) response. I say write the query so it's clear and let the CBO do it's magic. – WoMo Dec 17 '13 at 16:07

2 Answers2

0

Using NOT EXISTS:

SELECT TRIM(et.event_id), TRIM(et.cancel_evt_id)
FROM external_transactions et
JOIN transaction_type tt
    ON et.transaction_type_id = tt.transaction_type_id
WHERE et.acct = 'ABCDEF' 
AND tt.transaction_type_class != 'XYZXYZ'
AND NOT EXISTS
        (
        SELECT 1
        FROM transactions t1
        WHERE t1.acct = et.acct
        AND t1.asset_id = et.asset_id
        AND ( (   t1.cancel_flag = 'N'
              AND TRIM(et.event_id) = TRIM(t1.transaction_evt_id) )
            OR
              (   t1.cancel_flag = 'Y'
              AND TRIM(et.cancel_evt_id) = TRIM(t1.cancel_evt_id) )
            )
        );
MT0
  • 143,790
  • 11
  • 59
  • 117
  • correlated sub-queries are typically horrible for performance. – DRapp Dec 17 '13 at 15:56
  • Really? [What's the difference between NOT EXISTS vs. NOT IN vs. LEFT JOIN WHERE IS NULL?](http://stackoverflow.com/questions/2246772/whats-the-difference-between-not-exists-vs-not-in-vs-left-join-where-is-null) or [question regarding "Correlated Subqueries"](http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:3167884300346662300) or [NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle](http://explainextended.com/2009/09/17/not-in-vs-not-exists-vs-left-join-is-null-oracle/) – MT0 Dec 17 '13 at 16:26
  • Not Exists and Not In will pre-run the entire query and return all rows first, then be applied to rest. By doing LEFT JOIN, the engine just sets up the relationships between the tables and it's either their or not (hence looking for the "ID" IS NULL). The record/index pointers move at the same time as each record from the main table is being considered. Say you have 100k cancel flags, but you only care about event X and that has 200 records, of which 18 are cancelled. Blowing through 200 and you are done. – DRapp Dec 17 '13 at 16:44
  • Also, your NOT EXISTS as a correlated sub query is running that query every time for every record vs the relationships just established between the tables. – DRapp Dec 17 '13 at 16:46
0

Aside from comment, this is assuming your "ID" columns are integer based and not string, don't try to convert them.

Also, to help optimize the query, I would ensure you have indexes

External_Transactions    ( acct, event_id, cancel_evt_id )
Transaction_Type   ( transaction_type_id, transaction_type_class )
Transactions ( transaction_evt_id, acct, asset_id, cancel_flag )
Transactions ( cancel_evt_id, acct, asset_id, cancel_flag )


SELECT 
      et.event_id, 
      et.cancel_evt_id
   FROM 
      external_transactions et
         JOIN transaction_type tt
            ON et.transaction_type_id = tt.transaction_type_id
           AND tt.transaction_type_class != 'XYZXYZ'
         LEFT OUTER JOIN transactions t1
            ON  et.event_id = t1.transaction_evt_id
            AND et.acct = t1.acct
            AND et.asset_id = t1.asset_id
            AND t1.cancel_flag = 'N'
         LEFT OUTER JOIN transactions t2
            ON  et.cancel_evt_id = t2.cancel_evt_id
            AND et.acct = t2.acct
            AND et.asset_id = t2.asset_id
            AND t2.cancel_flag = 'Y'
   WHERE 
          et.acct = 'ABCDEF' 
      AND (    t1.transaction_evt_id IS NULL
            OR t2.cancel_evt_id IS NULL )

You might even benefit slightly if the transaction table had an index on

Transactions ( acct, asset_id, cancel_flag, transaction_evt_id, cancel_evt_id )

and the left-join was like

SELECT 
      et.event_id, 
      et.cancel_evt_id
   FROM 
      external_transactions et
         JOIN transaction_type tt
            ON et.transaction_type_id = tt.transaction_type_id
           AND tt.transaction_type_class != 'XYZXYZ'
         LEFT OUTER JOIN transactions t1
            ON  et.acct = t1.acct
            AND et.asset_id = t1.asset_id
            AND ( 
                  (   t1.cancel_flag = 'N'
                  AND et.event_id = t1.transaction_evt_id )
                OR
                  (   t1.cancel_flag = 'Y'
                  AND et.cancel_event_id = t1.cancel_evt_id )
                )
   WHERE 
          et.acct = 'ABCDEF' 
      AND t1.transaction_evt_id IS NULL

In both cases, the indexes would be COVERING indexes so it did not have to go back to the raw data pages to confirm other elements of the records

DRapp
  • 47,638
  • 12
  • 72
  • 142
  • DRapp, your first solution appears to do the trick. The Oracle Explain Plan cost estimate is reduced from 315430 to 8715. I'm still testing to ensure the new query returns the correct results, but I haven't found a test condition that breaks it yet. – ptc3 Dec 17 '13 at 18:19
  • And I will also be investigating whether the trim() function is logically necessary, based on suggestions from several folks. – ptc3 Dec 17 '13 at 18:21
  • @ptc3, glad it worked... but just wondering the overall performance improvement before/after times... – DRapp Dec 19 '13 at 03:13