0

So there's a lot of talk about concurrency and multi-threading lately, and for good reason, but I'm having trouble seeing practical applications.

Like, .NET recently added ConcurrentDictionary, ConcurrentBag, ConcurrentStack, ConcurrentQueue, etc.

What are some concrete practical examples of when these would come in to play? Ideally I'd like some easily relate-able scenario where either the non-concurrent one would fail or the concurrent one would be much faster due to ease-of-making parallel.

I'm primarily interested in the ConcurrentDictionary if that makes providing any examples easier. When I think about it, I don't understand how it being concurrent can help with speed.

Let's say you had a data source you wished to add to it. Wouldn't iterating over the data source across 8 threads distributed across 8 cores be of a similar to speed to just the one thread because in either case you're starting from one end and going to the other, and with the synchronous one there won't be any contention in adding data that has already been added?

Basically, I want to understand how concurrency can make programs faster and in what situations the non-concurrent implementation would fail, because I currently can't seem to think of many examples where they would be useful in this regard. (Primary language C#)

  • I don't find `ConcurrentQueue` particularly useful by itself, but as the backing store for `BlockingCollection`, it makes creating producer/consumer applications very easy. See the [second part](http://blog.mischel.com/2013/07/18/simple-multithreading-part-2/) of my [Simple Multithreading](http://blog.mischel.com/2013/07/16/simple-multithreading-part-1/) blog series for an example. – Jim Mischel Mar 24 '15 at 16:08
  • Concurrent data structures do not give you speed, moreover they slow you down comparing with regular collections. But the garantie that you data is correct, so they eliminate so-called heisenbugs. – Alex Sikilinda Mar 24 '15 at 16:08

2 Answers2

3

A concurrent data structure will, in general, be slower in adding than a data structure that doesn't support concurrent modification. If you're only adding one thing at a time. That is, if your code is simply a single thread that does this:

foreach (var item in list)
{
    AddItemToDataStructure(item);
}

That will run faster if the data structure is Dictionary than if it's ConcurrentDictionary.

But if you have multiple threads doing that against ConcurrentDictionary, you do get some speed improvement. A lot of the work involved in updating a dictionary involves determining where the item will go and if an item with that key already exists. That work can be done by multiple threads concurrently. Only the final insert (i.e. updating the internal data structures) requires some kind of synchronization to prevent corruption of the data structure. And that synchronization turns out to be pretty minimal most of the time.

(The above is a simplified explanation, but I think you get the idea.)

As for ConcurrentQueue and ConcurrentStack, they're not really there to make things faster, but rather to make things possible. Say, for example, that you have two threads reading data from some external source, and one thread processing data that's read. You have two producers and one consumer. They communicate through a shared queue. So what you have is two threads that do this:

while data is available
    read data
    add data to shared queue

And one thread that's doing this:

while not end of program
    while data is in queue
        read data from queue
        process data

That's possible with the non-concurrent data structures (i.e. Queue and Stack) if you wrap those data structures with locks, but it's surprisingly difficult to get that right. And if you want to eliminate polling loops it's even more difficult. The concurrent data structures do all that for you.

The result is that using ConcurrentQueue most likely will be faster than wrapping a simple synchronization wrapper around Queue.

Concurrency makes programs faster by allowing multiple threads doing disparate or related tasks concurrently. Concurrent data structures facilitate that by providing reliable communication between threads. In addition, concurrent data structures will usually outperform a corresponding non-concurrent data structure that's enclosed in a simple synchronization wrapper.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
0

In Java HashMap breaks when used concurrently without sufficient locking. See Is a HashMap thread-safe for different keys? for a good answer.

In Java a ConcurrentSkipListSet uses the Skip List data structure and so can be updated and iterated by multiple threads concurrently without breaking without using locks. It is therefore much faster than implementing a locking regime on an ordinary Set.

Community
  • 1
  • 1
OldCurmudgeon
  • 64,482
  • 16
  • 119
  • 213
  • I'm no expert, but I have yet to see a skip list implementation that does not use locks and that is also guaranteed to be free of deadlock and to maintain the shape invariants that guarantee high speed when operations are interleaved. Proper lock-free structures seem to be devilishly complex, and skip lists don't seem to have the sorts of single-pass operations that let them really work. – dfeuer Mar 24 '15 at 17:55