If you would have a massive problems with you approach, you are very probably missing an index on the column clean.id
, that is required for your approach when the MERGE
uses dual
as a source for each row.
This is less probable while you are saying the id
is a primary key.
So basically you are doing the right think and you will see execution plan similar as the one below:
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | | | 2 (100)| |
| 1 | MERGE | CLEAN | | | | |
| 2 | VIEW | | | | | |
| 3 | NESTED LOOPS OUTER | | 1 | 40 | 2 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | DUAL | 1 | 2 | 2 (0)| 00:00:01 |
| 5 | VIEW | VW_LAT_A18161FF | 1 | 38 | 0 (0)| |
| 6 | TABLE ACCESS BY INDEX ROWID| CLEAN | 1 | 38 | 0 (0)| |
|* 7 | INDEX UNIQUE SCAN | CLEAN_UX1 | 1 | | 0 (0)| |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
7 - access("CLEAN"."ID"=:ID)
So the execution plan is fine and works effectively, but it has one problem.
Remember always you use an index, you will be happy while processing few rows, but it will not scale.
If you are processing a millions of records, you may fall back to a two step processing,
The big advantage is that Oracle can open a hash join
and get rid of the index access for each of the million rows.
Here an example of a test of the clean
table initiated with 1M id
(not shown) and performing 1M insert and 1M updates:
n = 1000000
data2 = [{"id" : i, "xcount" :1} for i in range(2*n)]
sql3 = """
insert into tmp (id,count)
values (:id,:xcount)"""
sql4 = """MERGE into clean USING tmp on (clean.id = tmp.id)
when not matched then insert (id, count) values (tmp.id, tmp.count)
when matched then update set clean.count= clean.count + tmp.count"""
cursor.executemany(sql3, data2)
cursor.execute(sql4)
The test runs in aprox. 10 second, which is less than a half of you approach with MERGE
using dual
.
If this is still not enough, you'll have to use parallel option.