2

I am shifting a C# ASP.NET Core 7 project from using SqlClient with regular SQL queries to using Entity Framework instead. I have a particular place when the application runs multiple long-running tasks, it's kind of a simulation with a big for loop where the user can follow progress, and for that reason, each task writes into the database dozens of times in its own task. The old SqlClient solution worked smoothly with minimal CPU and memory usage, but with EF, once the threads are beginning to work, everything halts and freezes.

I know that DbContext is not thread-safe, therefore each task creates its own DbContext, and they create it, particularly where the database inserts occur, and I dispose them right away once they are not needed, and yet, in the for loop it completely freezes the computer and everything stops. The web application is not even responding anymore.

The simplified controller:

    public SmContext db { get; set; }

    public SimulateRoundModel(SmContext db)
    {
        this.db = db;
    }

    public async Task<IActionResult> OnPost()
    {
        List<Match> matches = new CollectorClass(db).Collect();
        MyClass.Wrapper(matches);
        return Page();
    }

The simplified code:

public static void Wrapper(List<Match> matches)
{
    Parallel.For(0, matches.Count,
           index =>
           {
               matches[index].LongSim();
           });
}

Match class:


private SmContext db { get; set; }

public Match(db)
{
    this.db = db;
}

public void longSim()
{
    db.Dispose(); // disposing the main dbcontext that the constructor receives, we don't want to use that

    using (SmContext db = new SmContext())
    {
        // some initial query and insert
    }

    for (int i = 0; i < 100; i++)
    {
        Thread.Sleep(5000);

        // some simulation

        db = new SmContext();

        SomeInsert(); // these are using the db for the insert
        SomeInsert();
        SomeInsert();

        db.Dispose();
    }
}

We are talking about 5-50 matches and Parallel.For optimized them very well with the old SqlClient solutions, I have seen running it with 200 matches without an issue before. These are not intensive tasks, only simple stuff, and some queries, but they are running long. Ideally, I would like to continue saving the progress to the database without a major rewrite.

The ultimate question is, is there a conceptual issue here, that I am too newbie to recognize, or this solution should work fine and there is something fuzzy going on in the black spots of the code?

Klaus Gütter
  • 11,151
  • 6
  • 31
  • 36
  • Which driver and exact version? – ErikEJ Mar 04 '23 at 06:41
  • 1
    Could you try configuring the `Parallel.For` with a small `ParallelOptions.MaxDegreeOfParallelism`, like `2`, and see what happens? – Theodor Zoulias Mar 04 '23 at 08:32
  • Btw using the `SmContext` in three different ways, a shared context passed in the constructor, a short-living context with `using`, and a long-living context created by the `longSim` and cached, looks like a recipe for disaster to me. Not to mention that it makes your question harder to answer. Could you eliminated the `private SmContext db { get; set; }` from the equation, in order to reduce the unknowns from the question? – Theodor Zoulias Mar 04 '23 at 08:38
  • @TheodorZoulias Yeah, it is a great idea, but unfortunately MaxDegreeOfParallelism is not really an option for me, all tasks must run concurrently. This stuff starts at a specific time, and each user must be able to watch their progress at that time, not 10 minutes later or so. Also, eliminating private property can be problematic too, as this class has tons of other functions using that from other parts of this app, if would be a major rewrite. – user1734337 Mar 04 '23 at 09:01
  • In that case I would suggest to make your intentions explicit by configuring the `Parallel.For` with `MaxDegreeOfParallelism = -1`, and also ensure that the `ThreadPool` can create immediately on demand threads for all the `matches`, by configuring the `ThreadPool.SetMinThreads`. For example: `ThreadPool.SetMinThreads(workerThreads: 1000, completionPortThreads: 100);`. But playing with small `MaxDegreeOfParallelism` values can still be useful, for debugging purposes. It might give us clues about what's going on with your code. – Theodor Zoulias Mar 04 '23 at 09:43
  • @TheodorZoulias Alrighty will test it today, sounds good. I know my concept of using the DbContext is not ideal, but hopefully not fundamentally wrong. I hope it will solve the peak. – user1734337 Mar 04 '23 at 11:58
  • 1
    Also consider [context pooling](https://learn.microsoft.com/en-us/ef/core/performance/advanced-performance-topics?tabs=with-di%2Cwith-constant#dbcontext-pooling). – Gert Arnold Mar 04 '23 at 12:01

1 Answers1

1

It would more in guess territory then something I can prove but from my experience multiple SomeInsert's with the same context look a bit suspicious. EF Core performs insert/update operation relying on tracking and even if you use AsNoTracking new entries still will be handled by change tracker, so if you are actually inserting a lot of data (and note that EF always was not very suitable for batch inserts) you will end up with the change tracker having a lot of entities which can slow down EF performance considerably. I would suggest one of the following options:

  • Call ChangeTracker.Clear after inserting some considerable amount of entities* (this also can be used instead of recreating the context outside the loop)
  • Recreate the context after inserting some considerable amount of entities*
  • Use another technology or extension library (EFCore.BulkExtensions for example) supporting bulk inserts

* - you will need to determine the optimal size of inserted data to recreate/clear tracker and call SaveChanges, like was done for old iteration of EF in this answer.

P.S.

Parallel.For
public void longSim()
Thread.Sleep(5000);

I would strongly advice to make longSim asynchronous by using await Task.Delay(5000) and switch to Parallel.ForEachAsync which supports async methods. This also will allow to use async versions of EF Core methods.

One more thing which can be worth taking into consideration is thread pool starvation which can sometimes have somewhat similar "side" effects but if the only change you made is the switch to EF Core instead SQLClient and it leads to the observed behaviour then thread pool starvation should not be the reason.

Guru Stron
  • 102,774
  • 10
  • 95
  • 132