My C# application retrieves over a million records from Sql Server, processes them and then updates the database back. This results in close to 100,000 update statements and they all have the following form -
update Table1 set Col1 = <some number> where Id in (n1, n2, n3....upto n200)
"Id" is an int, primary key with clustered index. No two update statements update the same Ids, so in theory, they can all run in parallel without any locks. Therefore, ideally, I suppose I should run as many as possible in parallel. The expectation is that all finish in no more than 5 minutes.
Now, my question is what is the most efficient way of doing it? I'm trying the below -
- Running them sequentially one by one - This is the least efficient solution. Takes over an hour.
- Running them in parallel by launching each update in it's on thread - Again very inefficient because we're creating thousands of threads but I tried anyway and it took over an hour and quite a few of them failed because of this or that connection issue.
- Bulk inserting in a new table and then do a join for the update. But then we run into concurrency issues because more than one user are expected to be doing it.
- Merge batches instead of updates - Google says that merge is actually slower than individual update statements so I haven't tried it.
I suppose this must be a very common problem with many applications out there that handle a sizeable amounts of data. Are there any standard solutions? Any ideas or suggestions will be appreciated.