I'm using Parallel.ForEach
and it's hugely improving the performance of my code, but I'm curious about DbContext
with multiple threads. I know it's not thread safe so I'm using locks where I need to.
The loop iterates over a dictionary and calculates statistics:
Dictionary<string, List<decimal>> decimalStats = new Dictionary<string, List<decimal>>(); // this gets populated in another irrelevant loop
List<ComparativeStatistic> comparativeStats = db.ComparativeStatistics.ToList();
var statLock = new object();
Parallel.ForEach(decimalStats, entry =>
{
List<decimal> vals = ((List<decimal>)entry.Value).ToList();
if (vals.Count > 0)
{
string[] ids = entry.Key.Split('#');
int questionId = int.Parse(ids[0]);
int yearId = int.Parse(ids[1]);
int adjacentYearId = int.Parse(ids[2]);
var stat = comparativeStats.Where(l => l.QuestionID == questionId && l.YearID == yearId && l.AdjacentYearID == adjacentYearId).FirstOrDefault();
if (stat == null)
{
stat = new ComparativeStatistic();
stat.QuestionnaireQuestionID = questionId;
stat.FinancialYearID = yearId;
stat.AdjacentFinancialYearID = adjacentYearId;
stat.CurrencyID = currencyId;
stat.IndustryID = industryId;
lock (statLock) { db.ComparativeStatistics.Add(stat); }
}
stat.TimeStamp = DateTime.Now;
decimal total = 0M;
decimal? mean = null;
foreach (var val in vals)
{
total += val;
}
mean = Decimal.Round((total / vals.Count), 2, MidpointRounding.AwayFromZero);
stat.Mean = mean;
}
});
db.SaveChanges();
My question: Why do I only need the lock when I'm adding something to the database? If stat
is never null - if there's always already a database entry for it - I can run this loop without a lock with no problems, and the database gets updated as intended. If stat
is null for a particular loop and I don't have the lock there, a System.AggregateException
gets thrown.
edit1: I've tried opening a new connection to the database each time instead of using lock
, which also works when adding to the database (identical to the loop above, I've added comments where it differs):
Parallel.ForEach(decimalStats, entry =>
{
List<decimal> vals = ((List<decimal>)entry.Value).ToList();
if (vals.Count > 0)
{
using (var dbThread = new PDBContext()) // new db connection
{
string[] ids = entry.Key.Split('#');
int questionId = int.Parse(ids[0]);
int yearId = int.Parse(ids[1]);
int adjacentYearId = int.Parse(ids[2]);
var stat = comparativeStats.Where(l => l.QuestionID == questionId && l.YearID == yearId && l.AdjacentYearID == adjacentYearId).FirstOrDefault();
if (stat == null)
{
stat = new ComparativeStatistic();
stat.QuestionnaireQuestionID = questionId;
stat.FinancialYearID = yearId;
stat.AdjacentFinancialYearID = adjacentYearId;
stat.CurrencyID = currencyId;
stat.IndustryID = industryId;
dbThread.ComparativeStatistics.Add(stat); // no need for a lock
}
stat.TimeStamp = DateTime.Now;
decimal total = 0M;
decimal? mean = null;
foreach (var val in vals)
{
total += val;
}
mean = Decimal.Round((total / vals.Count), 2, MidpointRounding.AwayFromZero);
stat.Mean = mean;
dbThread.SaveChanges(); // save
}
}
});
Is this safe to do? I'm sure Entity Framework's connection pooling is smart enough but I'm wondering if I should add any parameters to limit the number of threads/connections.