I have a .Net Core MVC app which does some moderately heavy proability calculations. I am running a parallel loop over a list of ~2.5 million entries. Since it is a parallel loop, I am using a concurrent bag to hold the result objects. For each iteration, I then find the entry in my concurrent bag and iterate the value, essentially counting the number of times that result occurred. Here is a boiled down example of what is happening:
// results class
public class RandResult
{
public int id { get; set; }
public int val { get; set; }
}
// list of ints I iterate over
var intList = new List<int>();
for(var i = 0; i < 2500000; i++)
{
intList.Add(i);
}
var bagResult = new ConcurrentBag<RandResult>()
{
new RandResult() { id = 0, val = 0 },
new RandResult() { id = 1, val = 0 },
new RandResult() { id = 2, val = 0 },
new RandResult() { id = 3, val = 0 },
new RandResult() { id = 4, val = 0 }
};
watch.Restart();
Parallel.ForEach(intList, i =>
{
bagResult.First(b => b.id == i % 5).val++;
});
timers.Add(watch.ElapsedMilliseconds / 1000.0); // ~1.3 seconds
You can see the timers I placed in the code to help evaluate speed. Even with this simple calculation here, that loop takes ~1.3 seconds, almost entirely due to the overhead of the concurrent bag. Given this relative inefficiency, I'm looking for alternatives. Here is what I've tried so far:
Using a regular List<RandResult>
and a lock:
// takes ~0.6sec
var _lock = new object();
Parallel.ForEach(intList, i =>
{
lock (_lock)
{
listResult.First(b => b.id == i % 5).val++;
}
});
Using an Interlock
was a bit more complicated
// takes ~0.2sec
var dict = new Dictionary<int, int>()
{
{ 0, 0 },{ 1, 1 },{ 2, 2 },{ 3, 3 },{ 4, 4 }
};
int[] indexes = new int[5] { 0, 1, 2, 3, 4 };
int[] vals= new int[5] { 0, 0, 0, 0, 0 };
Parallel.ForEach(intList, i =>
{
dict.TryGetValue(i % 5, out int k);
Interlocked.Increment(ref vals[k]);
});
This one is more complicated because the Id values won't be consecutive ints so the Dictionary serves as a reverse lookup.
The question is, are there any other options?
Note:
The actual calculation being done is certainly more complex than i%5
but the real question here is about recording the results so that serves for the example. Also, even in the full application there will never be more than 10 entries in the List/Bag of RandResult
.
Bonus question: I'm a bit shocked that the ConcurrentBag option was so much slower than everything else. I understand there is quite a bit of overhead involved with parallelism and concurrency in general but that seems excessive. Does anyone know why it is so much slower?