3

I have a .NET 4.5 Single Instance WCF service which maintains a collection of items in a list which will have simultaneous concurrent readers and writers, but with far more readers than writers.

I am currently deciding on whether to use the BCL ConcurrentBag<T> or use my own custom generic ThreadSafeList class (which extends IList<T> and encapsulates the BCL ReaderWriterLockSlim as this is more suited for multiple concurrent readers).

I have found numerous performance differences when testing these implementations by simulating a concurrent scenario of 1m readers (simply running a Sum Linq query) and only 100 writers (adding items to the list).

For my performance test I have a list of tasks:

List<Task> tasks = new List<Task>();

Test 1: If I create 1m reader tasks followed by 100 writer tasks using the following code:

tasks.AddRange(Enumerable.Range(0, 1000000).Select(n => new Task(() => { temp.Where(t => t < 1000).Sum(); })).ToArray());
tasks.AddRange(Enumerable.Range(0, 100).Select(n => new Task(() => { temp.Add(n); })).ToArray());

I get the following timing results:

  • ConcurrentBag: ~300ms
  • ThreadSafeList: ~520ms

Test 2: However, if I create 1m reader tasks mixed with 100 writer tasks (whereby the list of Tasks to be executed could be {Reader,Reader,Writer,Reader,Reader,Writer etc}

foreach (var item in Enumerable.Range(0, 1000000))
{
    tasks.Add(new Task(() => temp.Where(t => t < 1000).Sum()));
    if (item % 10000 == 0)
        tasks.Add(new Task(() => temp.Add(item)));
}

I get the following timing results:

  • ConcurrentBag: ~4000ms
  • ThreadSafeList: ~800ms

My code for getting the execution time for each test is as follows:

Stopwatch watch = new Stopwatch();
watch.Start();
tasks.ForEach(task => task.Start());
Task.WaitAll(tasks.ToArray());
watch.Stop();
Console.WriteLine("Time: {0}ms", watch.Elapsed.TotalMilliseconds);

The efficiency of ConcurrentBag in Test 1 is better and the efficiency of ConcurrentBag in Test 2 is worse than my custom implementation, therefore I’m finding it a difficult decision on which one to use.

Q1. Why are the results so different when the only thing I’m changing is the ordering of the tasks within the list?

Q2. Is there a better way to change my test to make it more fair?

user978139
  • 579
  • 4
  • 16
  • 1
    ConcurrentBag is optimized for the same thread that wrote in to the bag reads out of the bag. If you do not have the same thread reading out of the bag switch over to a ConcurrentQueue instead. Read [this MSDN page](https://msdn.microsoft.com/en-us/library/dd997373(v=vs.110).aspx), in your examples you are using a "Pure producer-consumer scenario", each thread either only reads or only writes to the bag, no single thread both reads and writes in the same thread. – Scott Chamberlain May 29 '15 at 14:20
  • I agree - this then leads me on to ask when you'd ever use a ConcurrentBag because from my understanding you can't have a simultaneous read and write on the same thread........can you? Therefore I'd have each thread maintain its own List. Surely naming the Bag as part of the concurrent collections was a bad idea due to its inefficiency in a multithreaded scenario?....unless there's something I'm missing! – user978139 May 29 '15 at 14:56
  • Bag is intended for situations where you have a bunch of worker threads who all dump work in to a pool then when finished pull out of the same pool. If a worker ends up finishing all the work he put in (the way it stores data internally it will get back the data the thread put in first) he will start to "steal" work from other threads. – Scott Chamberlain May 29 '15 at 15:51
  • Or the other common use could be workers try to pull from the pool, sees there is no item, makes a new item, then when done with the resource adds the new item/returns the old item back in to the pool. The next time the same thread needs a item from the pool it will be faster, but if another thread access the pool it can use the first threads object without creating it's own. A good real world example of this behavior is [connection pooling](https://msdn.microsoft.com/en-us/library/8xx3tyca(v=vs.110).aspx). – Scott Chamberlain May 29 '15 at 15:55
  • I think this warranted its own question due to your extensive answers. I think the best way to remember this would be your connection pooling example. – user978139 May 29 '15 at 16:10

1 Answers1

2

Why are the results so different when the only thing I’m changing is the ordering of the tasks within the list?

My best guess is that Test #1 does not actually read items, as there is nothing to read. The order of task execution is:

  1. Read from shared pool 1M times and calculate sum
  2. Write to shared pool 100 times

Your Test # 2 mixes the reads and writes and this is why, I am guessing, you get a different result.

Is there a better way to change my test to make it more fair?

Before you start tasks, try randomising order of the tasks. It might be difficult to reproduce the same result, but you may get closer to real world usage.

Ultimately, your question is about difference of optimistic concurrency (Concurrent* classes) vs pessimistic concurrency (based on a lock). As a rule of thumb, when you have low chances of simultaneous access to a shared resource prefer optimistic concurrency. When the chances of simultaneous access are high prefer pessimistic one.

oleksii
  • 35,458
  • 16
  • 93
  • 163
  • Thanks for your input! In test 1, changing the order round so that the writes are first and the reads are second makes a massive difference in performance. The ConcurrentBag is coming out at nearly ~9s, where as my custom ThreadSafeList is 1300ms. I believe test 2 is a good attempt at randomising the order of tasks? I believe there to be a high chance of pessimistic concurrency as the WCF service is a publish-subscribe server where the subscriptions (writing to the list) is rare but the concurrent broadcast (reading from the list) of multiple messages to be frequent. – user978139 May 29 '15 at 14:09
  • I'd use something like Fisher-Yates-Durstenfeld shuffle. See implementation [here](http://stackoverflow.com/a/3173893/706456). And when you start tasks, it may look something like: `tasks.Shuffle().ForEach(task => task.Start());` Looks like your implementation is better (but only for your particular use case) – oleksii May 29 '15 at 14:13