Update ~30 million records in SCD-2 on daily basis

Question

I have a SCD-2 table which holds 400M+ records. On daily basis I get 40M records out of which ~20M gets inserted & updated. I use Talend & Oracle for this project. The problem is it takes a lot of time (many hours) to update this number of records. My SCD-2 target table has structure as below:

SKEY    NUMBER(38,0)
AS_OF_DATE  DATE
TSTP    DATE
COUNTRY VARCHAR2(2 BYTE)
RISK    VARCHAR2(50 BYTE)
DATE_OF_DATA    DATE
PRIMARY_COUNTRY VARCHAR2(5 BYTE)
RISK_VALUES NUMBER(8,5)
START_DATE  DATE
END_DATE    DATE

Primary key- Skey,as_of_date,primary_country and (monthly) partitioned on as_of_date (daily process date). How can I improve the performance of the job to be able to update records in target table faster?

I have made attempts to insert all the data for update in a TEMP stage table and then use MERGE to update records in the target table. Also I ran one update statement using inner join between stage and target table but I still see poor performance. The target table is indexed on as_of_date, clustered index on primary_country and end_date.

The queries used are:

*merge into geo_crisks_delta D
using (select as_of_date,primary_country,
skey,end_date 
from geo_crisks_delta_test) T 
ON 
(D.as_of_date= T.as_of_date   
 and D.primary_country=T.primary_country
 and D.skey=T.skey)
when matched then
update
set D.end_date = T.end_date
where 
D.primary_country in 
 (select distinct country from geo_countries) and 
(D.end_date=to_date('2099-12-31','yyyy-MM-dd'));*

//OR

*update
(select a.end_date as delta,b.end_date as stage 
from 
geo_crisks_delta a 
inner join geo_crisks_delta_test b
on 
a.as_of_date=b.as_of_date
and a.primary_country=b.primary_country
and a.skey=b.skey
where a.end_date=to_date('2099-12-31','yyyy-MM-dd') ) t 
set t.delta=t.stage;*

I have less understanding of writing stored procedures. Can anyone help what should I do to improve this and what is it that I am doing wrong?

How does the execution plan look? If it's a hash join from smaller to larger and it is showing as OPTIMAL in v$sql_workarea then the only other option is parallel, unless maybe it's possible to populate a second table somehow and switch them around at the end (i.e. replace UPDATE with INSERT). — William Robertson, Jul 26 '21 at 19:43
Please check [here](https://stackoverflow.com/a/34975420/4808122) the *minimum information* you need to provide. — Marmite Bomber, Jul 27 '21 at 07:51
Doing this kind of thing a few thousand rows at a time can make your COMMITs (which possibly are autocommits) a lot faster. 20 megarows' worth of updates in one COMMIT makes your database work fairly hard. Worth a try... — O. Jones, Jul 30 '21 at 13:13

Update ~30 million records in SCD-2 on daily basis

0 Answers0