I am porting some SQL server procedures to Oracle and find an interesting situation where the Oracle SQL statements are dramatically slower than the identical logic using cursors.
On investigation, I think there may be a particualr problem with 'NOT EXISTS' (maybe?).
Here I put 100k leads into TMP_TXN and then use these as a filter to extract records from Payments for which there is no transaction (see the SQL construct below).
INSERT INTO tmp_txn ....
SELECT ....
FROM txn,
customers
WHERE txn.customer_id = customers.customer_id
AND customers.customer_status LIKE 'A%'
AND txn.txn_date BETWEEN start_date AND end_date;
then insert into tmp_leads where they are in payments but not in TMP_TXN.
INSERT INTO tmp_leads ....
SELECT ....
FROM payments eap, customers
WHERE eap.customer_id = customers.customer_id
AND customers.customer_status LIKE 'A%'
AND NOT EXISTS (SELECT TMP_TXN.CUSTOMER_ID
FROM TMP_TXN
WHERE tmp_txn.customer_id = eap.customer_id
AND ....;
The explain plan is:
Plan hash value: 67643415
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 665 | 138K| 15450 (1)| 00:03:06 |
| 1 | LOAD TABLE CONVENTIONAL | TMP_LEADS | | | | |
|* 2 | FILTER | | | | | |
|* 3 | HASH JOIN | | 665 | 138K| 14785 (1)| 00:02:58 |
|* 4 | TABLE ACCESS FULL | CUSTOMER_TYPES | 6 | 36 | 3 (0)| 00:00:01 |
|* 5 | HASH JOIN | | 726 | 146K| 14781 (1)| 00:02:58 |
|* 6 | TABLE ACCESS FULL | EDM_SEGMENTS_EVENTS | 23 | 414 | 5 (0)| 00:00:01 |
| 7 | NESTED LOOPS | | | | | |
| 8 | NESTED LOOPS | | 1297 | 239K| 14776 (1)| 00:02:58 |
|* 9 | TABLE ACCESS FULL | EDM_AGREEMENT_PAYMENTS | 1297 | 158K| 12180 (1)| 00:02:27 |
|* 10 | INDEX UNIQUE SCAN | PK_CUSTOMERS | 1 | | 1 (0)| 00:00:01 |
|* 11 | TABLE ACCESS BY INDEX ROWID| CUSTOMERS | 1 | 64 | 2 (0)| 00:00:01 |
|* 12 | TABLE ACCESS BY INDEX ROWID | TMP_TXN | 1 | 81 | 2 (0)| 00:00:01 |
|* 13 | INDEX RANGE SCAN | IX_TMP_TXN_TXN_CODE | 1 | | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter( NOT EXISTS (SELECT 0 FROM "TMP_TXN" "TMP_TXN" WHERE "TMP_TXN"."TXN_CODE"=:B1 AND
"TMP_TXN"."CUSTOMER_ID"=:B2 AND "TMP_TXN"."AGREEMENT_ID"=:B3 AND ("TMP_TXN"."AMOUNT"<0 AND
"TMP_TXN"."AMOUNT">=:B4*1.1 AND "TMP_TXN"."AMOUNT"<=:B5*.9 OR "TMP_TXN"."AMOUNT">0 AND
"TMP_TXN"."AMOUNT">=:B6*0.9 AND "TMP_TXN"."AMOUNT"<=:B7*1.1)))
3 - access("CUSTOMERS"."CUSTOMER_TYPE"="CUSTOMER_TYPES"."CUSTOMER_TYPE")
4 - filter("CUSTOMER_TYPES"."ACTIVE"=U'1')
5 - access("CUSTOMERS"."CUSTOMER_SEGMENT"="EDM_SEGMENTS_EVENTS"."SEGMENT")
6 - filter("EDM_SEGMENTS_EVENTS"."EVENT_ID"=607 AND "EDM_SEGMENTS_EVENTS"."ACTIVE"=U'1')
9 - filter(ROUND("EAP"."PMNT_DAY",0)>=19 AND ROUND("EAP"."PMNT_DAY",0)<=31 AND
"EAP"."PERIODICITY"=U'M' AND "EAP"."EVENT_ID"=607)
10 - access("EAP"."CUSTOMER_ID"="CUSTOMERS"."CUSTOMER_ID")
11 - filter("CUSTOMERS"."CUSTOMER_STATUS" LIKE U'A%')
12 - filter("TMP_TXN"."CUSTOMER_ID"=:B1 AND "TMP_TXN"."AGREEMENT_ID"=:B2 AND
("TMP_TXN"."AMOUNT"<0 AND "TMP_TXN"."AMOUNT">=:B3*1.1 AND "TMP_TXN"."AMOUNT"<=:B4*.9 OR
"TMP_TXN"."AMOUNT">0 AND "TMP_TXN"."AMOUNT">=:B5*0.9 AND "TMP_TXN"."AMOUNT"<=:B6*1.1))
13 - access("TMP_TXN"."TXN_CODE"=:B1)
There is an index on tmp_txn (customer_id), and there are about 100k records in the table. Oracle has 20gb SGA and 20gb PGA, so this should be cached easily.
Resource plan screenshot Here you can see the script running but not using any resources (data reads <100k/sec!).
The (possible) problem seems to be in the NOT EXISTS in that this selection takes >1000 seconds with almost no access of the data tables (resource monitor). Stats view in OM showing 898 seconds and 100% cpu
Am I doing something stupid in Oracle? This works well (and quickly) in SQL Server.