1

I have a requirement where I need to update thousands of records in live database table, and although there are many columns in this table, I only need to update 2-3 columns.

Further I can't hit database for thousand times just for updating which can be done in a batch update using SQL Server Table Valued Parameter. But again I shouldn't update all thousands records in one go for better error handling, instead want to update records in batches of x*100.

So, below is my approach, please give your valuable inputs for any other alternatives or any change in the proposed process -

  • 1 Fetch required records from database to List<T> MainCollection
  • 2 Save this collection to XML file with each element Status = Pending
  • 3 Take first 'n' elements from XML file with Status = Pending and add them to new List<T> SubsetCollection
  • 4 Loop over List<T> SubsetCollection - make required changes to T
  • 5 Convert List<T> SubsetCollection to DataTable
  • 6 Call Update Stored Procedure and pass above DataTable as TVP
  • 7 Update Status = Processed for XML Elements corresponding to List<T> SubsetCollection
  • 8 If more records with Pending status exists in XML file, go to Step# 3.

Please guide for a better approach or any enhancement in above process.

DaveShaw
  • 52,123
  • 16
  • 112
  • 141
inutan
  • 10,558
  • 27
  • 84
  • 126
  • If the data is coming from the same database perhaps try a CLR function? You can write it in C# and you can let it rip on your database without worrying too much about doing it in batches. – Jeremy Nov 30 '11 at 22:36
  • Information available here: http://msdn.microsoft.com/en-us/library/ms189876.aspx – Jeremy Nov 30 '11 at 22:36
  • 2
    How is this different from your [earlier question](http://stackoverflow.com/questions/8318614/c-sharp-batch-update-to-database-using-stored-procedure)? – Adam Lear Dec 01 '11 at 01:30
  • Is the logic really so complicated that you must use c#? I understand you use the language that your familiar with and understand, but I bet you'd be surprised in what can be accomplished, as a set, with things like temp tables and case statements. – brian Dec 01 '11 at 03:15
  • Please refrain from reposting your questions. If you need to add details, edit your original question. – NullUserException Dec 01 '11 at 04:39

2 Answers2

3

I would do a database-only approach if possible and if not possible, eliminate the parts that will be the slowest. If you are unable to do all the work in a stored procedure, then retrieve all the records and make changes.

The next step is to write the changes to a staging table with SQL Bulk Copy. This is a fast bulk loaded that will copy thousands of records in seconds. You will store the primary key and the columns to be updated as well as a batch number. The batch number is assigned to each batch of records, therefore allowing another batch to be loaded without conflicting with the first batch.

Use a stored procedure on the server to process the records in batches of 100 or 1000 depending on performance. Pass the batch number to the stored procedure.

We use such a method to load and update millions of records in batches. The best speed is obtained by eliminating the network and allowing the database server to handle the bulk of the work.

I hope this might provide you with an alternate solution to evaluate.

Leons
  • 2,679
  • 1
  • 21
  • 25
  • Although I used the same approach as I have mentioned in my question - I mark this as answer for sharing idea and suggesting a different approach to problem :-) – inutan Mar 06 '12 at 22:58
1

It may not be the best practice but you could embed some logic inside a SQL Server CLR function. This function could be called by a Query,StoProc or a schedule to run at a certain time.

The only issue I can see is getting step 4 to make the required changes on T. Embedding that logic into the database could be detrimental to maintenance, but this is no different to people who embed massive amounts of business logic into StoProcs.

Either way SQL Server CLR functions may be the way to go. You can create them in Visual Studio 2008, 2010 (Check the database new project types).

Tutorial : http://msdn.microsoft.com/en-us/library/w2kae45k(v=vs.80).aspx

Jeremy
  • 3,880
  • 3
  • 35
  • 42
  • 1
    Creating CLR objects is tricky business... They're a pain to debug. – JesseBuesking Nov 30 '11 at 23:01
  • If you do it right they are a real gem. You can debug them with a bit of IoC tactics. Ie IDbConnection and the like. – Jeremy Nov 30 '11 at 23:02
  • Oh well. I'm just throwing ideas around. :D – Jeremy Nov 30 '11 at 23:38
  • Thanks a lot for sharing your ideas. It might be easy to use CLR functions but seems to be a bit complicated to me at first look... need something simple and efficient solution... waiting for some more ideas – inutan Nov 30 '11 at 23:48