0

after reading some docs on how to manage DB connections via DbContext (that isn't thread safe) inside a Parallel.ForEach, I was wondering if there was an intelligent way to manage the MaxDegreeOfParallelism property based on the webserver resources combined with the maximum pool size of the DbContext so as not to excessively stress the database having to iterate even thousands of records, or if eventually that `MaxDegreeOfParallelism is not the property that I have to manage for my case

This is my iteration:

public async Task<T> DoThings()
{
    HashSet<FruizioneDto> theList = // get items;
    
    DbContextOptionsBuilder<WebinarDbContext> builder = new();
    builder.UseLazyLoadingProxies();
    builder.UseSqlServer(Environment.GetEnvironmentVariable("ConnectionString"));
    PooledDbContextFactory<WebinarDbContext> factory = new(options: builder.Options, poolSize: 1024); // <- 1024 is the PooledDbContext default value

    ParallelOptions parallelOptions = new() { MaxDegreeOfParallelism = -1 }; // -1 = run at full speed
    await Parallel.ForEachAsync(theList, parallelOptions, async (fruizione, cancellationToken) =>
    {
        using WebinarDbContext context = factory.CreateDbContext();
        using (IDbContextTransaction transaction = context.Database.BeginTransaction())
        {
            try
            {
                // some business logic...
        
                transaction.Commit();
            }
            catch (FunctionBusinessLogicException ex)
            {
                transaction.Rollback();

                // some exception handling...

                return;
            }
            catch (Exception ex)
            {
                transaction.Rollback();

                // some exception handling...
                return;
            }
        }
    }

   return T;
}

What i do inside the loop is generate a pdf certificate about a course taken by a user and update some records to DB for each iteration. What I'm iterating (theList) is a collection of objects of type FruizioneDto containing information about the user and the course followed.

I calculate the valid "credits" for the course, I do some replaces on an html template which I convert into pdf and upload to an Azure storage account. If all these operations went well, I'm going to update the database on the Fruizione table, and I add a record on the newly generated certificate in another table. All this for each iteration. It's a lot of code but if needed I can share it.

Is there any way to determine how many degrees of parallelism combined with the max pool size of the DbContext? thanks!

  • 2
    Often, if it makes sense to be performing database work in parallel, it's a mistake to have brought it up into the application layer in the first place - database systems can be great at parallelizing work *if you give them that work to do in a form they can work with*. Without knowing the specific nature of the work (and why you think it makes sense to try parallelizing the work in the application layer rather than the database), it's difficult to offer general guidance. – Damien_The_Unbeliever May 09 '23 at 13:39
  • note [this thread](https://stackoverflow.com/questions/44063832). One conclusion: _"The performance gains with parallel processing in my opinion is insignificant, and this does not justify the need for abandoning Dependency Injection which would create development overhead/maintenance debt, potential for bugs if handled wrong, and a departure from Microsoft's official recommendations."_ – JHBonarius May 09 '23 at 13:41
  • Secondly: if you generate a scoped Transaction (i.e. put it in a using block or statement), you don't need to manually roll back: [read this](https://learn.microsoft.com/en-us/ef/core/saving/transactions). Instead of fixing an error, you just have to `.Commit` is there's no error. – JHBonarius May 09 '23 at 13:43
  • @JHBonarius actually using parallel foreach the job instead of processing 2000 records in 20 minutes does it in 4 minutes, I think it's not that insignificant. Maybe I'll add more details on what I do inside the loop and thanks for the transaction suggestion! – Matteo Pietro Peru May 09 '23 at 13:50
  • Interesting: maybe you should add your results as an answer to that question. Maybe (probably) the optimal level of parallelism is dependent on the situation. So it's something you have to tune for your application. – JHBonarius May 09 '23 at 13:55
  • Could you try adding the line `ThreadPool.SetMinThreads(100, 100)` before starting the parallel loop, and report what effect has in performance? Also do you have any asynchronous operation (`await`) inside the `body` delegate of the loop? – Theodor Zoulias May 09 '23 at 15:05
  • @TheodorZoulias i will try and i'll tell you! I have some awaits inside the body delegate – Matteo Pietro Peru May 09 '23 at 15:38

0 Answers0