0

I am creating a module (an exe, based on Dotnet) using which I am supposed to churn a huge data (100 million records) from a NoSql database (MongoDB) in shortest possible time and the logic consists of costly operations like encryption, decryption and this is critical data, so we need to be really careful with the same.

Currently the basic logic is in place but it is currently running really really slow ( i.e. 50 records / 5 mins using one single main thread). Now, to multi-thread it I am thinking to use, Task parallel library in which there might be two approaches:

  1. Using Parallel.For: this is an easier approach, in which the code will work as different threads.
  2. Using Different tasks for Batches: This approach has different tasks having lower - upper bounds using which the executions are separated and hence will not create a fuss. Though in this method still we need to figure out how to properly manage some task failures.

But mainly here the execution time is the burning issue. Which method here can give me better throughput? Or if any other method can be used?

Though I am building POCs for both, but any guidance will be helpful.

halfer
  • 19,824
  • 17
  • 99
  • 186
bhuvin
  • 1,382
  • 1
  • 11
  • 28
  • 2
    The only answer with this little information is: benchmark. It will show you how your environment and code would scale. – Sami Kuhmonen May 23 '17 at 05:08
  • You need to change your approach on a higher level. Consider scenario where you performance increases by 100 time which is so much unlikely. Then by doing math you get 144,000 records per day which is slow and I doubt it is a suitable time frame for you as well. Or change your algorithm that handles data from DB. – Karolis Kajenas May 23 '17 at 05:13
  • Only a benchmark will tell for sure but a quick search suggests that Parallel can be faster e.g. https://stackoverflow.com/a/18174382/2401021 or https://stackoverflow.com/a/5009224/2401021. – majita May 23 '17 at 05:34
  • It depends on your logic. Normally, you should group the tasks that can be done in parallel and no dependency on them with a `Parallel.For`. Then group other tasks in task groups and set dependency between tasks – Haitham Shaddad May 23 '17 at 05:50
  • 1
    Sorry, we can't really give you answer, as optimizations often depend on specifics of problem being solved. And you didn't provide enough information for us to know those specifics. – Euphoric May 23 '17 at 06:17
  • 1
    "P.S: I am stuck with a tight deadline." You are doing it wrong. REALLY WRONG. Performance requirements should have been known from the start. If you are just now trying to figure out solution to your problem, that is also fast, then you will have to reimplement your solution. – Euphoric May 23 '17 at 06:18

0 Answers0