53

The accepted answer to the question "Why does this Parallel.ForEach code freeze the program up?" advises to substitute the List usage by ConcurrentBag in a WPF application.

I'd like to understand whether a BlockingCollection can be used in this case instead?

TylerH
  • 20,799
  • 66
  • 75
  • 101
Fulproof
  • 4,466
  • 6
  • 30
  • 50

4 Answers4

85

You can indeed use a BlockingCollection, but there is absolutely no point in doing so.

First off, note that BlockingCollection is a wrapper around a collection that implements IProducerConsumerCollection<T>. Any type that implements that interface can be used as the underlying storage:

When you create a BlockingCollection<T> object, you can specify not only the bounded capacity but also the type of collection to use. For example, you could specify a ConcurrentQueue<T> object for first in, first out (FIFO) behavior, or a ConcurrentStack<T> object for last in,first out (LIFO) behavior. You can use any collection class that implements the IProducerConsumerCollection<T> interface. The default collection type for BlockingCollection<T> is ConcurrentQueue<T>.

This includes ConcurrentBag<T>, which means you can have a blocking concurrent bag. So what's the difference between a plain IProducerConsumerCollection<T> and a blocking collection? The documentation of BlockingCollection says (emphasis mine):

BlockingCollection<T> is used as a wrapper for an IProducerConsumerCollection<T> instance, allowing removal attempts from the collection to block until data is available to be removed. Similarly, a BlockingCollection<T> can be created to enforce an upper-bound on the number of data elements allowed in the IProducerConsumerCollection<T> [...]

Since in the linked question there is no need to do either of these things, using BlockingCollection simply adds a layer of functionality that goes unused.

Jon
  • 428,835
  • 81
  • 738
  • 806
  • 3
    @ Jon, thanks, this helped me a lot to break me from the state of idiocy and stop loosing time studying ConcurrentBag and BlockingCollection when I really need ConcurrentDictionary – Fulproof Mar 15 '13 at 04:14
34
  • List<T> is a collection designed to use in single thread applications.

  • ConcurrentBag<T> is a class of Collections.Concurrent namespace designed to simplify using collections in multi-thread environments. If you use ConcurrentCollection you will not have to lock your collection to prevent corruption by other threads. You can insert or take data from your collection with no need to write special locking codes.

  • BlockingCollection<T> is designed to get rid of the requirement of checking if new data is available in the shared collection between threads. if there is new data inserted into the shared collection then your consumer thread will awake immediately. So you do not have to check if new data is available for consumer thread in certain time intervals typically in a while loop.

Ahmet Arslan
  • 5,380
  • 2
  • 33
  • 35
  • I see no class for `ConcurrentCollection` From the decompiler: public class ConcurrentBag : IProducerConsumerCollection, IEnumerable, IEnumerable, ICollection, IReadOnlyCollection – C. Tewalt Oct 23 '19 at 17:46
  • 1
    I still gave +1 - it helped to have the clarification on the consumer thread awakening on BlockingCollection Thanks! – C. Tewalt Oct 23 '19 at 17:48
  • Hi @Ahmet Arslan, Thank you for sharing great knowledge, I also want use Blocking collection coz I do not want to check if new data is available for consumer thread in certain time intervals typically in a while loop, but I am not getting how to implement it. Can you please help me with some code snippet? – Dreamer Oct 03 '22 at 07:06
18

Whenever you find the need for a thread-safe List<T>, in most cases neither the ConcurrentBag<T> nor the BlockingCollection<T> are going to be your best option. Both collections are specialized for facilitating producer-consumer scenarios, so unless you have more than one threads that are concurrently adding and removing items from the collection, you should look for other options (with the best candidate being the ConcurrentQueue<T> in most cases).

Regarding especially the ConcurrentBag<T>, it's an extremely specialized class targeting mixed producer-consumer scenarios. This means that each worker-thread is expected to be both a producer and a consumer (that adds and removes items from the same collection). It could be a good candidate for the internal storage of an ObjectPool class, but beyond that it is hard to imagine any advantageous usage scenario for this class.

People usually think that the ConcurrentBag<T> is the thread-safe equivalent of a List<T>, but it's not. The similarity of the two APIs is misleading. Calling Add to a List<T> results to adding an item at the end of the list. Calling Add to a ConcurrentBag<T> results instead to the item being added at a random slot inside the bag. The ConcurrentBag<T> is essentially unordered. It is not optimized for being enumerated, and does a lousy job when it is commanded to do so. It maintains internally a bunch of thread-local queues, so the order of its contents is dominated by which thread did what, not by when did something happened. Before each enumeration of the ConcurrentBag<T>, all these thread-local queues are copied to an array, adding pressure to the garbage collector (source code). So for example the line var item = bag.First(); results in a copy of the whole collection, for returning just one element.

These characteristics make the ConcurrentBag<T> a less than ideal choice for storing the results of a Parallel.For/Parallel.ForEach loop.

A better thread-safe substitute of the List<T>.Add is the ConcurrentQueue<T>.Enqueue method. "Enqueue" is a less familiar word than "Add", but it actually does what you expect it to do.

There is nothing that a ConcurrentBag<T> can do that a ConcurrentQueue<T> can't. For example neither collection offers a way to remove a specific item from the collection. If you want a concurrent collection with a TryRemove method that has a key parameter, you could look at the ConcurrentDictionary<K,V> class.

The ConcurrentBag<T> appears frequently in the Task Parallel Library-related examples in Microsoft's documentation. Like here for example. Whoever wrote the documentation, apparently they valued more the tiny usability advantage of writing Add instead of Enqueue, than the behavioral/performance disadvantage of using the wrong collection. This makes some sense considering that the examples were authored at a time when the TPL was new, and the goal was the fast adoption of the library by developers who were mostly unfamiliar with parallel programming. I get it, Enqueue is a scary word when you see it for the first time. Unfortunately now there is a whole generation of developers that have incorporated the ConcurrentBag<T> in their mental tools, although it has no business being there, considering how specialized this collection is.

In case you want to collect the results of a Parallel.ForEach loop in exactly the same order as the source elements, you can use a List<T> protected with a lock. In most cases the overhead will be negligible, especially if the work inside the loop is chunky. An example is shown below, featuring the Select LINQ operator for getting the index of each element.

var indexedSource = source.Select((item, index) => (item, index));

List<TResult> results = new();

Parallel.ForEach(indexedSource, parallelOptions, entry =>
{
    var (item, index) = entry;
    TResult result = GetResult(item);
    lock (results)
    {
        while (results.Count <= index) results.Add(default);
        results[index] = result;
    }
});

This is for the case that the source is a deferred sequence with unknown size. If you know its size beforehand, it is even simpler. Just preallocate a TResult[] array, and update it in parallel without locking:

TResult[] results = new TResult[source.Count];

Parallel.For(0, source.Count, parallelOptions, i =>
{
    results[i] = GetResult(source[i]);
});

The TPL includes memory barriers at the end of task executions, so all the values of the results array will be visible from the current thread (citation).

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
3

Yes, you could use BlockingCollection for that. finishedProxies would be defined as:

BlockingCollection<string> finishedProxies = new BlockingCollection<string>();

and to add an item, you would write:

finishedProxies.Add(checkResult);

And when it's done, you could create a list from the contents.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • 2
    Jim Mischel, reading your [.NET Reference Guide](http://www.informit.com/guides/guide.aspx?g=dotnet). I wish I could find it earlier, so close to my practical needs – Fulproof Mar 15 '13 at 04:24