1.3 billion record ETL optimization Oracle to SQL Server

Question

I have a question regarding a good way to optimize the transfer of anywhere from 1 billion to 1.3 billion records (with capacity up to 2 billion or more records in the future) between Oracle and SQL Server.

*EDIT*I forgot to mention -- this process needs to be run weekly, within the time frame of of 1 weekend.

Currently I plan on using an ETL package. I'm looking at Talend or SSIS for right now.

Here is the data setup. The data resides on Oracle, and needs to be gathered first and repackaged before transmission.

Currently my process is this: 1.) A stored proc Extracts all the data and repackages it into an oracle table. 2.) An ETL will scan this table and move it to SQL Server DB. 3.) The table on Oracle and SQL Server are identical.

I was looking for some thoughts on optimization of the process. How would be good ways to optimize the inserts into SQLserver?

1.) I was thinking to just simply drop the table on SQL Server and then do create table as select * from oracle_table

2.) A redesign of the stored proc such that it runs in the ETL (saves me from having to insert into the oracle) and then directly insert into SQL Server -- On low volume tests with Talend (I can't speak for SSIS) I got some performance issues - the restructured procedure took longer to extract the data then calling it directly on Oracle. Is this normal, or maybe I messed up the optimization?

3.) Is there a better way to do massive data transmission that I am not aware of?

Any advice would be great. I think linked servers are an option here, but one is not setup, and I've never set one up before.

Any points/tips would be great, as I'm pretty new to this whole thing.

Are you really generating 1-2 billion brand new rows every single week? That seems rather unlikely. It seems highly likely that the biggest source of optimization is building a change data capture process so that you're only moving whatever small fraction of rows were actually added or modified over the week. — Justin Cave, Jul 09 '14 at 15:46
Actually yes -- I know it sounds crazy, but the data changes from week to week and is self correcting. — A Confused Dude, Jul 09 '14 at 17:59
So the vast majority of your 1-2 billion rows are changing every week? That seems very unlikely. Even if you have a process that is correcting data, it seems highly likely that from one week to the next the vast majority of data is not changing. — Justin Cave, Jul 09 '14 at 18:09
Would'nt a backup and restore be of any help on this big a table? If both the servers use the same SAN, it would be certainly beneficial. I have not done this before, but just throwing it in there so that someone who has could shed some factual lights on it. — rvphx, Jul 10 '14 at 07:59

score 5 · Accepted Answer · answered Jul 09 '14 at 15:18

5

Your best approach would be:

Push results of stored procedure directly to a flat file.
Use bcp or BULK INSERT to push the flat file into SQL server.

If you absolutely need a copy of the table in Oracle, then I would at least compare the performance of:

writing to the table using the stored procedure
importing the same flat file you generated into Oracle using the Bulk Loader

answered Jul 09 '14 at 15:18

Kyle Hale

7,912
1
37
58

Having only had a sampling size of one project moving data from Oracle to SQL Server, could you help me and future readers why you'd pay the additional cost of exporting to file and importing from file to SQL Server versus just keeping it all in memory with an ETL tool like SSIS? – billinkc Jul 09 '14 at 15:21
@billinkc : I'm not looking for a flame-war, but are you saying that in your experience SSIS is faster than export/import? Maybe we didn't have SSIS setup correctly, but one project I worked on, export/load was 1/2 the overall time. – shellter Jul 09 '14 at 15:41
Even with the Attunity adapters, Oracle sources in SSIS have never performed as well for high-volume transfers as Oracle->file->BULK INSERT, even with the cost. For us, we've seen the performance tradeoff when moving as little as 30 million (wide) rows. Plus you can much more easily parallelize the file process, although I would be curious to see how Attunity's parallelism feature works with a sort of partitioned stored procedure hitting all the same base tables at different index points. – Kyle Hale Jul 09 '14 at 15:42
For most purposes, I would just use SSIS - Source in, Destination out. But Oracle as a source for large datasets is definitely a case where (in my experience) data flow throughput is perfectly fine, but suboptimal compared to a flat file middleman. – Kyle Hale Jul 09 '14 at 15:43
@shellter Nor am I. Great discussion here btw and should be patched into Kyle's answer to help people understand *why* versus *internet sez*. Standard caveat of "it depends" applies, but my point was that in a general sense, `(Disk Reads from source + disk write for export + disk read for import + disk write for SQL Server) > (Disk reads from source + abuse of memory + Disk write for SQL Server)`. If that formula doesn't hold true for Oracle in the Colletive's experience, then I'll make a note to always evaluate the two approaches. – billinkc Jul 09 '14 at 15:50
About your solution to flat file > bulk import: Thanks, I'll give it a shot. I need to do a lot of conversion and research based on the information in my procedure. Essentially my procedure is bulk collect headers -> Fetch the details (make some tweaks too about 0-52 lines of details per each header), convert this information into a 'flat' record and then insert it. Is there an easy way to spool a stored procedure, or will I have to rewrite this using something else? – A Confused Dude Jul 10 '14 at 15:54
There are commands in Oracle to write the results of a stored procedure to a file. See this SO question for an example: http://stackoverflow.com/questions/9565316/oracle-create-text-file-using-stored-procedure – Kyle Hale Jul 10 '14 at 18:03
You rock Kyle. Thanks – A Confused Dude Jul 10 '14 at 19:50

score 0 · Answer 2 · answered Nov 28 '17 at 13:24

There is some Best Practices and considerations you can do it in a better way

use Partitioning for Destination
Use Memory_Optimized Table for Staging or even as Destination
don't use select * , even if you need all Columns , try to pull out just as many columns you need instead

-use bulk insert (specially in SSIS)
Use drop/Recreate Indexing strategy both for source and destination in an appropriate way
use Microsoft attunity connector for oracle (In SSIS)
use parallelism load

1.3 billion record ETL optimization Oracle to SQL Server

2 Answers2