I have a SCD-2 table which holds 400M+ records. On daily basis I get 40M records out of which ~20M gets inserted & updated. I use Talend & Oracle for this project. The problem is it takes a lot of time (many hours) to update this number of records. My SCD-2 target table has structure as below:
SKEY NUMBER(38,0)
AS_OF_DATE DATE
TSTP DATE
COUNTRY VARCHAR2(2 BYTE)
RISK VARCHAR2(50 BYTE)
DATE_OF_DATA DATE
PRIMARY_COUNTRY VARCHAR2(5 BYTE)
RISK_VALUES NUMBER(8,5)
START_DATE DATE
END_DATE DATE
Primary key- Skey,as_of_date,primary_country and (monthly) partitioned on as_of_date (daily process date). How can I improve the performance of the job to be able to update records in target table faster?
I have made attempts to insert all the data for update in a TEMP stage table and then use MERGE to update records in the target table. Also I ran one update statement using inner join between stage and target table but I still see poor performance. The target table is indexed on as_of_date, clustered index on primary_country and end_date.
The queries used are:
*merge into geo_crisks_delta D
using (select as_of_date,primary_country,
skey,end_date
from geo_crisks_delta_test) T
ON
(D.as_of_date= T.as_of_date
and D.primary_country=T.primary_country
and D.skey=T.skey)
when matched then
update
set D.end_date = T.end_date
where
D.primary_country in
(select distinct country from geo_countries) and
(D.end_date=to_date('2099-12-31','yyyy-MM-dd'));*
//OR
*update
(select a.end_date as delta,b.end_date as stage
from
geo_crisks_delta a
inner join geo_crisks_delta_test b
on
a.as_of_date=b.as_of_date
and a.primary_country=b.primary_country
and a.skey=b.skey
where a.end_date=to_date('2099-12-31','yyyy-MM-dd') ) t
set t.delta=t.stage;*
I have less understanding of writing stored procedures. Can anyone help what should I do to improve this and what is it that I am doing wrong?