7

I'm using Parallel.ForEach and it's hugely improving the performance of my code, but I'm curious about DbContext with multiple threads. I know it's not thread safe so I'm using locks where I need to.

The loop iterates over a dictionary and calculates statistics:

Dictionary<string, List<decimal>> decimalStats = new Dictionary<string, List<decimal>>(); // this gets populated in another irrelevant loop

List<ComparativeStatistic> comparativeStats = db.ComparativeStatistics.ToList();
var statLock = new object();

Parallel.ForEach(decimalStats, entry =>
{
    List<decimal> vals = ((List<decimal>)entry.Value).ToList();

    if (vals.Count > 0)
    {
        string[] ids = entry.Key.Split('#');
        int questionId = int.Parse(ids[0]);
        int yearId = int.Parse(ids[1]);
        int adjacentYearId = int.Parse(ids[2]);

        var stat = comparativeStats.Where(l => l.QuestionID == questionId && l.YearID == yearId && l.AdjacentYearID == adjacentYearId).FirstOrDefault();

        if (stat == null)
        {
            stat = new ComparativeStatistic();
            stat.QuestionnaireQuestionID = questionId;
            stat.FinancialYearID = yearId;
            stat.AdjacentFinancialYearID = adjacentYearId;
            stat.CurrencyID = currencyId;
            stat.IndustryID = industryId;

            lock (statLock) { db.ComparativeStatistics.Add(stat); }
        }

        stat.TimeStamp = DateTime.Now;

        decimal total = 0M;
        decimal? mean = null;

        foreach (var val in vals)
        {
            total += val;
        }

        mean = Decimal.Round((total / vals.Count), 2, MidpointRounding.AwayFromZero);

        stat.Mean = mean;
    }
});

db.SaveChanges();

My question: Why do I only need the lock when I'm adding something to the database? If stat is never null - if there's always already a database entry for it - I can run this loop without a lock with no problems, and the database gets updated as intended. If stat is null for a particular loop and I don't have the lock there, a System.AggregateException gets thrown.

edit1: I've tried opening a new connection to the database each time instead of using lock, which also works when adding to the database (identical to the loop above, I've added comments where it differs):

Parallel.ForEach(decimalStats, entry =>
{
    List<decimal> vals = ((List<decimal>)entry.Value).ToList();

    if (vals.Count > 0)
    {
        using (var dbThread = new PDBContext()) // new db connection
        {
            string[] ids = entry.Key.Split('#');
            int questionId = int.Parse(ids[0]);
            int yearId = int.Parse(ids[1]);
            int adjacentYearId = int.Parse(ids[2]);

            var stat = comparativeStats.Where(l => l.QuestionID == questionId && l.YearID == yearId && l.AdjacentYearID == adjacentYearId).FirstOrDefault();

            if (stat == null)
            {
                stat = new ComparativeStatistic();
                stat.QuestionnaireQuestionID = questionId;
                stat.FinancialYearID = yearId;
                stat.AdjacentFinancialYearID = adjacentYearId;
                stat.CurrencyID = currencyId;
                stat.IndustryID = industryId;

                dbThread.ComparativeStatistics.Add(stat); // no need for a lock
            }

            stat.TimeStamp = DateTime.Now;

            decimal total = 0M;
            decimal? mean = null;

            foreach (var val in vals)
            {
                total += val;
            }

            mean = Decimal.Round((total / vals.Count), 2, MidpointRounding.AwayFromZero);

            stat.Mean = mean;

            dbThread.SaveChanges(); // save
        }
    }
});

Is this safe to do? I'm sure Entity Framework's connection pooling is smart enough but I'm wondering if I should add any parameters to limit the number of threads/connections.

notAnonymousAnymore
  • 2,637
  • 9
  • 49
  • 74
  • The first issue I saw the lock on local variable closure. – Hamlet Hakobyan Apr 21 '14 at 11:23
  • 2
    What is `comparativeStats` here ? Frankly, though, threading is hugely contextual. Without knowing exactly what you are doing, it is hard to comment - it could well be that it is broken in both cases, but that it explodes less frequently in one scenario. "I haven't seen it error" is very different, especially in threading, to "it is error free" – Marc Gravell Apr 21 '14 at 11:24
  • @HamletHakobyan, I'm new to parallel, I took that lock code from another answer here on SO. But it's working so far. – notAnonymousAnymore Apr 21 '14 at 11:29
  • @MarcGravell, thanks a lot for the insight. Is there anything besides locking that will make it not explode in either case? I was thinking of creating a new instance of `DbContext` each time. Like you said it's hugely contextual and this is trial and error for me at this stage. – notAnonymousAnymore Apr 21 '14 at 11:37
  • similar but not the same question: http://stackoverflow.com/questions/12827599/parallel-doesnt-work-with-entity-framework – CAD bloke Apr 03 '16 at 12:12
  • The statement `a System.AggregateException gets thrown` contains about as much information as the statement `something went wrong`. If you want to ask a meaningful question, then provide the full list of exceptions aggregated by that System.AggregateException ***in detail.*** – Mike Nakis Nov 20 '22 at 23:22

1 Answers1

0

Several years later. You are right. The correct solution is to create a new instance of the DbContext inside the Parallel.ForEach loop. The creation of DbContext is leightweight but the context is not thread-safe. Having said that, it's really important to limit the lifetime of a context only to the minimum necessary duration. So please never let it open for a long time after a request (ASP.NET). The best pattern when dealing with DbContexts is the using statement.

There is several documentation available:

https://learn.microsoft.com/en-us/ef/core/dbcontext-configuration/

https://learn.microsoft.com/en-us/ef/ef6/fundamentals/working-with-dbcontext

Sven
  • 2,345
  • 2
  • 21
  • 43