16

I am trying to write a program where I schedule items for removal by putting them in a collection from different threads and cleaning them up in a single thread that iterates of the collection and disposes the items.

Before doing so, I wondered what would yield the optimal performance so I've tried ConcurrentBag, ConcurrentStack and ConcurrentQueue and measured the time required to add 10,000,000 items.

I used the following program to test this:

class Program
{
    static List<int> list = new List<int>();
    static ConcurrentBag<int> bag = new ConcurrentBag<int>();
    static ConcurrentStack<int> stack = new ConcurrentStack<int>();
    static ConcurrentQueue<int> queue = new ConcurrentQueue<int>();
    static void Main(string[] args)
    {
        run(addList);
        run(addBag);
        run(addStack);
        run(addQueue);
        Console.ReadLine();
    }

    private static void addList(int obj) { lock (list) { list.Add(obj); } }

    private static void addStack(int obj) { stack.Push(obj); }

    private static void addQueue(int obj) { queue.Enqueue(obj); }

    private static void addBag(int obj) { bag.Add(obj); }



    private static void run(Action<int> action)
    {
        Stopwatch stopwatch = Stopwatch.StartNew();
        Parallel.For(0, 10000000, new ParallelOptions() { MaxDegreeOfParallelism = # },
            action);
        stopwatch.Stop();
        Console.WriteLine(action.Method.Name + " takes " + stopwatch.Elapsed);
    }
}

where # is the amount of threads used.

But the results are rather confusing:

With 8 threads:

  • addList takes 00:00:00.8166816
  • addBag takes 00:00:01.0368712
  • addStack takes 00:00:01.0902852
  • addQueue takes 00:00:00.6555039

With 1 thread:

  • addList takes 00:00:00.3880958
  • addBag takes 00:00:01.5850249
  • addStack takes 00:00:01.2764924
  • addQueue takes 00:00:00.4409501

So, no matter how many threads, it seems that just locking a plain old list is faster then using any of the concurrent collections, except, perhaps the queue if it needs to handle a lot of writes.

EDIT: after the comments below about Garbage and Debug build: Yes, this influences the benchmark. Debug build influence would be linear, Garbage would be increasing with higher memory usage.

Yet running the same test multiple times gives roughly the same results.

I moved the initialization of the collection to right before the test run and collect the garbage after the run now, like this:

list = new List<int>();
run(addList);
list = null;
GC.Collect();

With MaxDegreeOfParallelism set to 8 I get the following results:

  • addList takes 00:00:7959546
  • addBag takes 00:00:01.08023823
  • addStack takes 00:00:01.1354566
  • addQueue takes 00:00:00.6597145

with give or take 0.02 seconds deviation each time I run the code.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
Stijn Tallon
  • 373
  • 1
  • 4
  • 12
  • 5
    What exactly is your question? – tnw Oct 11 '13 at 20:00
  • Use the collection that functionally does what you need. Why would you consider a List as the items would not come off the List when you processed it? Why you you base a decision on time to add 1 million items when that is only part of what what you need? – paparazzo Oct 11 '13 at 20:09
  • 3
    You are making the standard benchmark mistakes. You are running the debug build and you are measuring jitting and garbage collection overhead. Run the Release build without a debugger, loop only a 1000 times, call GC.Collect() before every run() call and repeat the group of tests 10 times. Note the drastically different numbers you'll get. And how poorly they repeat. – Hans Passant Oct 11 '13 at 20:19
  • 2
    I've made the changes you suggested, and the fact remains that even with 1000 items, the bag remains much slower then the locked list. list @ 0.0035 sec vs 0.0138 for the bag. – Stijn Tallon Oct 12 '13 at 21:50
  • The `ConcurrentQueue` is the correct collection for your scenario. – Theodor Zoulias Feb 03 '23 at 03:42

4 Answers4

4

The concurrent collections are not always faster. more of them only see perf gains at higher levels of contention, and the actual workload has an impact as well. Check out this paper from the pfx team :)

http://blogs.msdn.com/b/pfxteam/archive/2010/04/26/9997562.aspx

Beware of premature optimization though. put something together that works and then optimize. especially since the actual workload is important. also, having locks as a perf bottleneck is pretty ware, usually there is some io or other algorithm that takes far longer :)

aL3891
  • 6,205
  • 3
  • 33
  • 37
  • how can the level of contention be higher then 8 threads constantly trying to modify the collection while there is no other load to give the collection a break? – Stijn Tallon Oct 13 '13 at 08:51
  • Well you can have more than 8 cpus, and more threads :) The slowdown with locks comes with the context switching caused by the lock, and that is not bound by hardware threads. with fewer threads, the same thread might get lucky and get the lock again, besides, `Paralell.Fore` will try any minimize thread switches, executing several loops iterations at the time. – aL3891 Oct 13 '13 at 12:17
  • more then 8 cpus is currently a rare sight. More threads then cpus doesn't increase the level of contention? or does it? If one thread is accessing, one other thread is at least not doing anything because the cpu(s) are busy with other threads? – Stijn Tallon Oct 23 '13 at 09:13
  • servers commonly have more than 8 cores, but perhaps that's not what you're targeting :) true, with say, one core, only one thread is active at the time, but all the other threads has to be woken and run too and that takes alot of time, so you still dont want too many threads – aL3891 Oct 24 '13 at 21:57
  • yes, context switching isn't free, but if you have the number of threads equal to the number of cores, the time lost should be roughly the time spend in locked state? – Stijn Tallon Nov 01 '13 at 20:46
  • Absolutley, assuming you have no IO or other blocking operations :) If you do, you'll end up with cores doing nothing while one of the 8 threads are blocked – aL3891 Nov 02 '13 at 09:31
  • 1
    @StijnTallon Threading is about continuing to do work whenever you are forced to wait, which often has nothing to do with CPU cores. Consider a process that pulls data from one database server, performs heavy calculations on it, writes to another database server, writes to files on the local HD, and visualizes data for the user. Even with 1 core, you might benefit from 12 or more threads for this process, since you can wait on database locks, file locks, network, cpu, gpu, disk io, etc... and requests to/from the same server may also be asynchronous for these same reasons! Very situational. – Daniel Jun 07 '18 at 14:55
2

Don't forget that you do not also have to add the items to the collection, but also have to retrieve them. So a more fair comparison is between a Monitor-based Queue<T>, and a BlockingCollection<T>, each with 8 producers and 1 consumer.

Then I get the following results on my machine (I've increased the number of iterations by a factor of 10):

  • AddQueue1 takes 00:00:18.0119159
  • AddQueue2 takes 00:00:13.3665991

But it's not only performance that's interesting. Have a look at the two approaches: It's very difficult to check Add/ConsumeQueue1 for correctness, while it's very easy to see that Add/ConsumeQueue2 is exactly doing what's intended thanks to the abstraction provided by BlockingCollection<T>.


static Queue<int> queue1 = new Queue<int>();
static BlockingCollection<int> queue2 = new BlockingCollection<int>();

static void Main(string[] args)
{
    Run(AddQueue1, ConsumeQueue1);
    Run(AddQueue2, ConsumeQueue2);
    Console.ReadLine();
}

private static void AddQueue1(int obj)
{
    lock (queue1)
    {
        queue1.Enqueue(obj);
        if (queue1.Count == 1)
            Monitor.Pulse(queue1);
    }
}

private static void ConsumeQueue1()
{
    lock (queue1)
    {
        while (true)
        {
            while (queue1.Count == 0)
                Monitor.Wait(queue1);
            var item = queue1.Dequeue();
            // do something with item
        }
    }
}

private static void AddQueue2(int obj)
{
    queue2.TryAdd(obj);
}

private static void ConsumeQueue2()
{
    foreach (var item in queue2.GetConsumingEnumerable())
    {
        // do something with item
    }
}

private static void Run(Action<int> action, ThreadStart consumer)
{
    new Thread(consumer) { IsBackground = true }.Start();
    Stopwatch stopwatch = Stopwatch.StartNew();
    Parallel.For(0, 100000000, new ParallelOptions() { MaxDegreeOfParallelism = 8 }, action);
    stopwatch.Stop();
    Console.WriteLine(action.Method.Name + " takes " + stopwatch.Elapsed);
}
dtb
  • 213,145
  • 36
  • 401
  • 431
  • assuming the threads are running completely separate from each other, this is true. But i forgot to mention that my consumer thread is synchronized with the producer threads, by this I mean that, while my producers are running, the consumer is guaranteed to be asleep and vice versa. So I am actually only concerned that all items are in the bag after the producers have done working. And that the items get added to the bag as quickly as possible. Order is of no concern. – Stijn Tallon Oct 12 '13 at 21:45
1

I wanted to see a comparison of performance for adding as well as taking. Here is the code I used:

class Program
{
    static List<int> list = new List<int>();
    static ConcurrentBag<int> bag = new ConcurrentBag<int>();
    static ConcurrentStack<int> stack = new ConcurrentStack<int>();
    static ConcurrentQueue<int> queue = new ConcurrentQueue<int>();
    static void Main(string[] args)
    {
        list = new List<int>();
        run(addList);
        run(takeList);

        list = null;
        GC.Collect();

        bag = new ConcurrentBag<int>();
        run(addBag);
        run(takeBag);

        bag = null;
        GC.Collect();

        stack = new ConcurrentStack<int>();
        run(addStack);
        run(takeStack);

        stack = null;
        GC.Collect();

        queue = new ConcurrentQueue<int>();
        run(addQueue);
        run(takeQueue);

        queue = null;
        GC.Collect();

        Console.ReadLine();
    }

    private static void takeList(int obj)
    {
        lock (list)
        {
            if (list.Count == 0)
                return;

            int output = list[obj];
        }
    }

    private static void takeStack(int obj)
    {
        stack.TryPop(out int output);
    }

    private static void takeQueue(int obj)
    {
        queue.TryDequeue(out int output);
    }

    private static void takeBag(int obj)
    {
        bag.TryTake(out int output);
    }

    private static void addList(int obj) { lock (list) { list.Add(obj); } }

    private static void addStack(int obj) { stack.Push(obj); }

    private static void addQueue(int obj) { queue.Enqueue(obj); }

    private static void addBag(int obj) { bag.Add(obj); }



    private static void run(Action<int> action)
    {
        Stopwatch stopwatch = Stopwatch.StartNew();
        Parallel.For(0, 10000000, new ParallelOptions()
        {
            MaxDegreeOfParallelism = 8
        }, action);
        stopwatch.Stop();
        Console.WriteLine(action.Method.Name + " takes " + stopwatch.Elapsed);
    }
}

And the output is:

  • addList takes 00:00:00.8875893
  • takeList takes 00:00:00.7500289
  • addBag takes 00:00:01.8651759
  • takeBag takes 00:00:00.5749322
  • addStack takes 00:00:01.5565545
  • takeStack takes 00:00:00.3838718
  • addQueue takes 00:00:00.8861318
  • takeQueue takes 00:00:01.0510706
  • In order: takeStack 0.384, takeBag 0.575, takeList 0.750, addQueue 0.886, addList 0.888, takeQueue 1.051, addStack 1.557, addBag 1.865. – Patrick Jun 17 '19 at 07:54
0

Yes but the whole point is you need some concurrency more than one thread, run this over a long period with concurrency to look at average performance as this does not take the locking strategy of the different collections into account.