0

I need to develop a procedure that will retrieve 100 rows (product items) performing some very complex calculations. On average, 0.3s is required for each item, which means that I may face 30s delay if I perform the calculation in serial. The calculation for each item is not depended on the result of the other items, so I am thinking to use c# asynchronous programming features in order to create threads that will make the calculations in parallel.

The above calculation will be performed in a ASP.NET Core app (.Net 6) that will serve about 10 users.

Until now, I used asynchronous programming for the purposes of keeping the main thread responsive, so I had no worries about system resources. Now I have to design a procedure that may require 100 x 10 = 1000 threads.

Keep in mind that the calculations are performed in the database, so calculation does not require any additional resources.

Should I do it?

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
Michelos
  • 26
  • 4
  • '"' is seconds? – Fildor Nov 29 '22 at 09:24
  • 3
    _"so I am thinking to use c# asynchronous programming features"_ - you shouldn't. You should think of _concurrent_ programming features. – Fildor Nov 29 '22 at 09:25
  • Yes. is seconds. – Michelos Nov 29 '22 at 09:26
  • 2
    "Keep in mind that the calculations are performed in the database, so calculation does not require any additional resources". How does that follow? You have a database server with an infinite number of cores, that won't mind if you blast 1000 complex concurrent calculations at it? Databases tend to be good at exploiting parallelism themselves for single requests already, so you may well be overtaxing the server if you have no means of concurrency control at all. Those 1000 potential threads on the client side are but one potential issue. – Jeroen Mostert Nov 29 '22 at 09:27
  • 5
    _"Keep in mind that the calculations are performed in the database"_ - so, your code basically only posts a (sql?) query and relays the result, right? Then it may be worthwhile to figure out in what extent the DBMS supports concurrency. – Fildor Nov 29 '22 at 09:28
  • So when I get it right your CPU has pretty less to do, making multiple threads pretty pointless. You should check - as Fildor suggests - if you can perform multiple requests on the database at the same time. – MakePeaceGreatAgain Nov 29 '22 at 09:31
  • 2
    If your calculations are performed on the database side and the client you are using is "truly" async then you potentially will not need 1000 threads server side (the question is how your database server will handle attempt to make so many concurrent requests) – Guru Stron Nov 29 '22 at 09:31
  • 1
    Concurrency and asynchronuity are different things. Concurrentcy means that you have multiple threads doing something in parallel, asynchronuity means that a specific thread can do something while only waiting for a resource to finish its task. – MakePeaceGreatAgain Nov 29 '22 at 09:34
  • As far as I understand, you are more worried about the DB performance than the .net app. The database is Oracle 19c and the calculations will be performed in pl/sql packages. I used Oracle a lot, so I dont understand why you are afraid of not being able to perform multiple requests. – Michelos Nov 29 '22 at 09:36
  • 1
    @MakePeaceGreatAgain *"Concurrency means that you have multiple threads doing something in parallel,"* -- Not exactly. You can have concurrency with zero threads. Check this out: [What is the difference between concurrency, parallelism and asynchronous methods?](https://stackoverflow.com/questions/4844637/what-is-the-difference-between-concurrency-parallelism-and-asynchronous-methods) – Theodor Zoulias Nov 29 '22 at 09:43
  • 2
    You might find this interesting: [Can the Oracle managed driver use async/await properly?](https://stackoverflow.com/questions/29016698/can-the-oracle-managed-driver-use-async-await-properly) – Theodor Zoulias Nov 29 '22 at 09:47
  • 1
    @Michelos if you are using Oracle you will need 1000 threads, cause their client is not truly async =) – Guru Stron Nov 29 '22 at 09:47
  • 1
    `Parallel.ForEach` is the decision for you as it decides by itself how to parallelise your tasks according to the environment (number of cores etc) https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/how-to-write-a-simple-parallel-foreach-loop – Roman Ryzhiy Nov 29 '22 at 09:58
  • @RomanRyzhiy contrary to what the docs say, the `Parallel.ForEach` doesn't decide anything. It just saturates the `ThreadPool`. It takes all the `ThreadPool` threads that are currently available, and asks for more. Essentially it owns the `ThreadPool`, starving any other concurrent operation that might depend on the `ThreadPool`. My suggestion is to specify always the `MaxDegreeOfParallelism` when you use the `Parallel` class. Microsoft [recommends otherwise](https://github.com/dotnet/runtime/issues/72981#issuecomment-1198200618). – Theodor Zoulias Nov 29 '22 at 11:56

1 Answers1

0

If whatever calculation you are running is compute limited, there is no reason to use more threads than you have logical cpu cores. There might be reasons to use less threads to reserve some resources for other things, like keeping the UI responsive. A parallel.For would be a typical solution to run compute limited code concurrently, this will automatically scale the numbers of threads used, but also allow a maximum to be set if you want to reserve some cores.

If you are IO limited you do not really need to use any threads. As long as you are using "true" asynchronous calls no threads will be used while the IO system is working. But note that IO operations may not scale well with higher concurrency, since there will be hardware limits. This is especially true if you have spinning disks.

If your workload is mixed compute and IO you might want to pipeline the IO and the compute, so take a look at DataFlow. If most of the work is performed by the database you may need to just try and see how many threads can be used before performance starts to drop. Databases will involve a mix of IO and compute, but also things locks and semaphores that might add additional limits. You should also check your queries to ensure that they are as efficient as possible, there is no need to spend a bunch of time optimizing concurrency if a index will make the queries 100 times faster.

JonasH
  • 28,608
  • 2
  • 10
  • 23