0

I have a console app that is basically doing this:

foreach (var database in databases)
{
    using (var db = new MyDbContext(database.Name))
    {
         //do some calculations, inserting of thousands of rows, removing thousands of rows, etc
    }
}

All works fine. But I want to improve it. So I was thinking of processing multiple databases at the same time. The number of databases I currently have is about 100 and is growing with time. So I think Parallel.ForEach will be too expensive. What is the best approach?

petko_stankoski
  • 10,459
  • 41
  • 127
  • 231
  • Why do you think Parallel.ForEach will be expensive? And what do you mean by "expensive" in this context? – stuartd Apr 15 '18 at 20:15
  • @stuartd Well 100 databases will be processed at the same time. The server won't be able to handle it. – petko_stankoski Apr 15 '18 at 20:17
  • You might want to look into `async/await` as that will remove blocks on your thread(s) while the DB IO is occurring. That can be used to parallelize the IO. If you want to parallelize CPU bound calculations then `Parallel.ForEach` could also be useful. – juharr Apr 15 '18 at 20:17
  • 2
    You can set a limit on Parallel.ForEach like [this](https://stackoverflow.com/questions/9290498/how-can-i-limit-parallel-foreach#9290531) – stuartd Apr 15 '18 at 20:19
  • Also why do you need to do the same operation on 100 databases? Are they meant to be kept in sync? If so I'd imagine there are DB tools that would handle that for you better than this. – juharr Apr 15 '18 at 20:21
  • If I remember correctly, Parallel.ForEach runs on the thread pool. It will only use so many threads. I would give it a try. I have ran into problems with reusing the DB can connection, but if you are "creating" a connection for each loop, you should be fine. Sorry for the initial post as an answer. The mobile app does that to me, and it doesn't give me a chance to delete on the mobile app. – Doug Apr 15 '18 at 20:28
  • @juharr It is the same operation but different data for each db. Each db has credentials which connect to API, or download file from server and I process the data I get from there. – petko_stankoski Apr 15 '18 at 20:40

1 Answers1

0

If I remember correctly, Parallel.ForEach runs on the thread pool. It uses the scheduler to use figure out how many threads it will use (https://msdn.microsoft.com/en-us/library/system.threading.tasks.paralleloptions.maxdegreeofparallelism(v=vs.110).aspx). However, you can set the number of threads that Parallel.ForEach will use by passing it ParallelOptions with the MaxDegreeOfParallelism (How can I limit Parallel.ForEach?). The only problem with this is if you need chain different database operations together; you would need to use the ContinueWith extension to do that.

Personally, I prefer to load the tasks into a ConcurrentQueue and start a specific count of worker threads to process the queue by calling a method with Task.Run(). This will allow you to requeue any failed task or perform some other notification task. You could do this with Parallel.ForEach, but I think the queue process makes more sense (to me) for this.

Doug
  • 3,472
  • 3
  • 21
  • 18