2

I'm quite new to coding in general and I'm looking to copy 47 columns with c300,000 rows of data, from an Oracle to an SQL database, on a daily basis. The code will be stored as a Windows Service, running at the same time every day (or more likely night).

The data from the Oracle DB table (let's call this the Oracle_Source) will be used to both append to a history table (call this SQL_History) and also to append new/update matching/delete missing rows from a live table (call this SQL_Live). The two types of databases are housed on different servers, but the two SQL tables are on the same DB.

I have a few questions around the best way to approach this.

  1. Using VB/C#, is it faster to loop through rows (either 1 by 1 or batches of 100/1000/etc.) of Oracle_Source and insert/update SQL_History/SQL_Live OR copy the entire table of Oracle_Source in one go and insert into the SQL tables? Previously I have used the loop to download data into a .csv.

  2. Using the more efficient of the above methods, would it be faster to work on both SQL tables simultaneously OR copy the data into the SQL_History table and then use that to APPEND/UPDATE/DELETE from the SQL_Live table?

  3. Am I approaching this completely wrong?

Any other advice available is also much appreciated.

Noor A Shuvo
  • 2,639
  • 3
  • 23
  • 48
Wowdude
  • 139
  • 1
  • 5
  • 16
  • 1
    One option would be using [SqlBulkCopy](https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy(v=vs.110).aspx) to copy the Oracle data into a SQL staging table, then use that to update/insert/delete the actual tables. – stuartd Jun 16 '17 at 08:42
  • 3
    Scheduled SSIS package? – Richard Jun 16 '17 at 08:42
  • @stuartd that looks like an good method and should sit nicely in a Windows Service. Will the class be able to hold such large volumes of data to update/insert/delete in a single job, do you think, or would it be better to split it into blocks (100/1,000/10,000/etc.)? – Wowdude Jun 16 '17 at 09:09
  • @Richard I like to look of the SSIS package, however I (annoyingly) don't have direct access to the SQL server, only remote access to its tables through MS SQL SMS. – Wowdude Jun 16 '17 at 09:13
  • 1
    I don't think loading 300k rows into a single SqlBulkCopy would be a good idea, so yes I would think you should batch them. You can experiment to see what size works best. – stuartd Jun 16 '17 at 09:13
  • @stuartd Thanks! I'll give this a go and (as you said) play around with the number of rows to see how our network and servers handle it. – Wowdude Jun 16 '17 at 09:16
  • You'll want some kind of ObjectDataReader like [this one](http://www.developerfusion.com/article/122498/using-sqlbulkcopy-for-high-performance-inserts/) or [this one](https://code.msdn.microsoft.com/ObjectDataReader-8476dd72) _(note I haven't used either of those)_ – stuartd Jun 16 '17 at 09:32
  • @stuartd ah brilliant! Thanks again! Looks like there are quite a number of NuGet packages as well that contain similar objects, but i am liking the look of the second one you posted (Class). – Wowdude Jun 16 '17 at 09:41

1 Answers1

2

The correct question is “What is the fast way to copy the table?” In your specific case , with 2 different server and a “big” table to copy, you are probably limited by network IO.

So, the first point is to update only the rows that must be update (Update/ Insert / Delete), so less byte to move.

To answer to your first point, you have to use transaction to improve the speed on sql server during the writing phase. The dimension of transaction depend on differenct factor (db, machine, ...) but I usually make transaction with 500/1000 simple commands. In my personal experience, if you use INSERT with more rows, you can send 500 rows for INSERT without performance issue.

In my experience, a bulk copy is faster than an efficient INSERT, UPDATE and DELETE because the db does not calculate key and does not check duplicate rows.

Better explained:

  1. you TRUNCATE all data
  2. DISABLE keys
  3. massive INSERT of all rows and
  4. re-ENABLE keys.

This is the faster way to copy a table but if your communication is from different server with low network speed this can't be the best choice. Obviously, what is the best choice depend from your infrastructure and the table dimension

For example:

  • If you have one server your lan and the second server on clouds, the bottleneck is on the speed of internet connection and you must pay more attention to have an efficient communication(less byte).
  • If both servers are on your lan with two gigabit connection, probably the full network communication are around 100mb, and you can use a simple move all the table rows without headache.
Linefinc
  • 375
  • 2
  • 9
  • Excellent explanation! I'll look into using the BulkCopy to bring it across. We have pretty good connection speeds to our servers tbh. I was able to write 50,000 rows to a `.csv` in approximately 20 seconds pulling from the Oracle server. Thanks! – Wowdude Jun 16 '17 at 09:57