ConcurrentDictionary<> performance at a single thread misunderstanding?

Question

Related brief info:

AFAIK , The concurrent stack, queue, and bag classes are implemented internally with linked lists.
And I know that there is much less contention because each thread is responsible for its own linked list. Any way , my question is about the ConcurrentDictionary<,>

But I was testing this code :(single thread)

Stopwatch sw = new Stopwatch();
sw.Start();

    var d = new ConcurrentDictionary < int,  int > ();
    for(int i = 0; i < 1000000; i++) d[i] = 123;
    for(int i = 1000000; i < 2000000; i++) d[i] = 123;
    for(int i = 2000000; i < 3000000; i++) d[i] = 123;
    Console.WriteLine("baseline = " + sw.Elapsed);

sw.Restart();

    var d2 = new Dictionary < int, int > ();
    for(int i = 0; i < 1000000; i++)         lock (d2) d2[i] = 123;
    for(int i = 1000000; i < 2000000; i++)   lock (d2) d2[i] = 123;
    for(int i = 2000000; i < 3000000; i++)   lock (d2) d2[i] = 123;
    Console.WriteLine("baseline = " + sw.Elapsed);

sw.Stop();

Result : (tested many times, same values (+/-)).

baseline = 00:00:01.2604656
baseline = 00:00:00.3229741

Question :

What makes ConcurrentDictionary<,> much slower in a single threaded environment ?

My first instinct is that lock(){} will be always slower. but apparently it is not.

First, are you running your tests in release mode without the debugger attached? (i.e. "Run without debugging"). Second, "each thread is responsible for its own linked list" is incorrect. If you want to see how these things are implemented, download the library source code from http://referencesource.microsoft.com/netframework.aspx — Jim Mischel, Mar 06 '13 at 16:03
I'm surprised that you're surprised. There is an overhead associated with ensuring that all operations are atomic. When the operation itself is so blindingly simple (in the case of `Dictionary` it's only setting a single array value and almost nothing else) that overhead can be a lot in comparison. When compared to non-trivial operations, the overhead is more likely to be negligible. — Servy, Mar 06 '13 at 16:03
@JimMischel http://books.google.co.il/books?id=EvXX03I5iLYC&pg=PA951&lpg=PA951&dq=%22Inside+a+concurrent+bag,+each+thread+gets+it+own+private+linked+list%22&source=bl&ots=jFhf0CvSr1&sig=_At669GUNjMflVZdsU6x0MZds5c&hl=en&sa=X&ei=MWk3UcayC9DltQa8soFA&sqi=2&redir_esc=y#v=onepage&q=%22Inside%20a%20concurrent%20bag%2C%20each%20thread%20gets%20it%20own%20private%20linked%20list%22&f=false — Royi Namir, Mar 06 '13 at 16:05
That's only true of `ConcurrentBag`, which is a quite different thing than the other collections you mentioned. A `ConcurrentQueue`, for example, couldn't be implemented in the same way, because then it wouldn't be FIFO. And `ConcurrentStack` wouldn't be LIFO. — Jim Mischel, Mar 06 '13 at 16:31
For CLR 2.0+, locks are optimized to lazy create a sync block only when there is contention. In your case, this never happens so there is basically no overhead in your Dictionary < int, int > () test. — Alex, Mar 07 '13 at 06:47
Did anyone also test reading out of a concurrent dictionary vs a normal one? — Dirk Boer, Jan 31 '16 at 10:40
I imagine your reasoning was, "I could either do this myself using lock or use ConcurrentDictionary, which was written by experts. Since the experts also had the option to use locks, surely their implementation will be equal to or better than mine." Surprisingly, a lot of the advanced approaches (e.g., lock-free algorithms) actually have more overhead than standard locks, but their throughput is higher under heavy load due to better handling of contention. See also [Reed Copsey](http://stackoverflow.com/a/5680988/18192)'s answer to another question. — Brian, Jul 08 '16 at 15:29
Of course, in the specific case of `ConcurrentDictionary`, a big part of the performance-improving sauce is that different buckets have their own independent locks (this explanation is slightly inaccurate). Thus, in many cases it is possible to avoid contention entirely; different threads will end up writing to different sections of the dictionary, which have separate locks and thus avoid blocking. This adds overhead, but improves multi-threaded throughput, since in many cases two threads can simultaneously touch the dicitionary with *neither* of them blocking. — Brian, Jul 08 '16 at 15:39
You use concurrent dictionary when you need a dictionary that is being accessed concurrently. Which would you prefer, being able to access a dictionary in 30ns unsafely or being able to access it in 90ns safely? — Mick, Sep 20 '16 at 02:08
Because of all the locking and overhead, a ConcurrentDictionary can also be slower in a multithreaded environment as opposed to using a single-dictionary with locks. [Here's a link with some performance specs](http://cc.davelozinski.com/c-sharp/dictionary-vs-concurrentdictionary). In most of the test scenarios the single-threaded Dictionary with a lock was more than twice as fast as a ConcurrentDictionary. — , Mar 06 '18 at 12:42

score 32 · Answer 1 · answered Mar 06 '13 at 16:04

32

Well, ConcurrentDictionary is allowing for the possibility that it can be used by multiple threads. It seems entirely reasonable to me that that requires more internal housekeeping than something which assumes it can get away without worrying about access from multiple threads. I'd have been very surprised if it had worked out the other way round - if the safer version were always faster too, why would you ever use the less safe version?

answered Mar 06 '13 at 16:04

Jon Skeet

1,421,763
867
9,128
9,194

2

Perhaps the degree of difference surprised him? – Servy Mar 06 '13 at 16:05
I'm not surprised by that factor, but I just think it's more likely the OP thought the slowdown should be say 1/2 or 20% or something, instead of thinking that `ConcurrentDictionary` would be faster than `Dictionary`. – Servy Mar 06 '13 at 16:10
@Servy No. I was thinking that if a BMW drives fast for long distances , it sure can drive fast in short distances also. ( if you see what i mean) – Royi Namir Mar 06 '13 at 16:11
@RoyiNamir No, I don't. A `ConcurrentDictionary` supports more robust functionality at the cost of speed. It will almost always be slower than a `Dictionary`, it just happens to work in cases where a `Dictionary` would produce garbage output (although it would produce it more quickly). – Servy Mar 06 '13 at 16:12
6

@Servy: A better comparison would be between a tank and a sports car. A tank is more robust, but will generally be slower... – Jon Skeet Mar 06 '13 at 16:23
7

@JonSkeet I'm not so sure. Who would ever pass up driving a tank for a sports car!? – Servy Mar 06 '13 at 16:24
1

I don't understand, if lock on dictionary works and performs faster why use `ConcurrentDictionary` after all? from amateur perspective like me (not academic) I would simply lock on `SyncRoot` for all accessible methods. how else could It be _unsafe_ or how could it become more safe? I don't understand. – M.kazem Akhgary Aug 31 '17 at 17:42
2

@M.kazemAkhgary: I don't see figures that show that locking on the dictionary from multiple threads is faster than using `ConcurrentDictionary`. That's certainly not what the question shows. It's also worth understanding that there are more operations than just what's shown here - things like iterating over a dictionary, where you wouldn't want to acquire the lock for the entirety of the iteration time. – Jon Skeet Aug 31 '17 at 18:03
I see, that makes sense, CuncurrentDictionary may perform better in multi threaded environment, good point about iteration, for example if normal dictionary exceeds its capacity, we lock Add operation until entire dictionary is reorganized, since `ConcurrentDictionary` locks in its internal implementation it has more control over locking (and locks when needed, I assume), where as using dictionary forces me to lock at root of call. – M.kazem Akhgary Aug 31 '17 at 18:16
@JonSkeet Funnily enough, ConcurrentDictionary reads on net5 now appear to be faster than Dictionary reads. Benchmark.Net results for 10k `TryGetValue` operations results in 63us v.s. 70us in ConcurrentDictionary's favor. – Mike Marynowski Dec 18 '20 at 03:31

JaredPar · Accepted Answer · 2013-03-06T16:11:01.003

30

The most likely reason that ConcurrentDictionary simply has more overhead than Dictionary for the same operation. This is demonstrably true if you dig into the sources

It uses a lock for the indexer
It uses volatile writes
It has to do atomic writes of values which are not guaranteed to be atomic in .Net
It has extra branches in the core add routine (whether to take a lock, do atomic write)

All of these costs are incurred irrespective of the number of threads that it's being used on. These costs may be individually small but aren't free and do add up over time

edited Mar 06 '13 at 16:11

answered Mar 06 '13 at 16:05

JaredPar

733,204
149
1,241
1,454

Aren't volatile _writes_ are by default ? – Royi Namir Mar 06 '13 at 16:17
1

@RoyiNamir no, in *general* writes are atomic by default (except for struct types > pointer size) – JaredPar Mar 06 '13 at 16:19
I was reading here (http://igoro.com/archive/volatile-keyword-in-c-memory-model-explained/) that --- _in C# all writes are volatile (unlike say in Java), regardless of whether you write to a volatile or a non-volatile field._ – Royi Namir Mar 06 '13 at 16:22
@RoyiNamir that particular blog entry is out of date and is not correct for ARM IIRC. Igor, the blog author, and I work on the same team. I'm going to ask him about this in person today when he gets in. – JaredPar Mar 06 '13 at 16:27
2

@RoyiNamir I chatted with Igor and he says indeed that particular line is incorrect with respect to ARM. He did recently write 2 articles for MSDN magazine which he says are more up to date and account for ARM http://msdn.microsoft.com/en-us/magazine/jj863136.aspx http://msdn.microsoft.com/en-us/magazine/jj883956.aspx – JaredPar Mar 06 '13 at 16:49
Thank you. (sorry but what is ARM ?) – Royi Namir Mar 06 '13 at 16:51
1

@RoyiNamir it's a particular processor type like x86 and amd64/x64. Most tablets run on ARM processors (iPad, Surface, etc ...) – JaredPar Mar 06 '13 at 16:55

Mike Marynowski · Answer 3 · 2020-12-18T03:46:29.803

Update for .NET 5: I'll leave the previous answer up as it is still relevant for older runtimes but .NET 5 appears to have further improved ConcurrentDictionary to the point where reads via TryGetValue() are actually faster than even the normal Dictionary, as seen in the results below (COW is my CopyOnWriteDictionary, detailed below). Make what you will of this :)

|          Method |        Mean |     Error |    StdDev |    Gen 0 |    Gen 1 |    Gen 2 | Allocated |
|---------------- |------------:|----------:|----------:|---------:|---------:|---------:|----------:|
| ConcurrentWrite | 1,372.32 us | 12.752 us | 11.304 us | 226.5625 |  89.8438 |  44.9219 | 1398736 B |
|        COWWrite | 1,077.39 us | 21.435 us | 31.419 us |  56.6406 |  19.5313 |  11.7188 |  868629 B |
|       DictWrite |   347.19 us |  5.875 us |  5.208 us | 124.5117 | 124.5117 | 124.5117 |  673064 B |
|  ConcurrentRead |    63.53 us |  0.486 us |  0.431 us |        - |        - |        - |         - |
|         COWRead |    81.55 us |  0.908 us |  0.805 us |        - |        - |        - |         - |
|        DictRead |    70.71 us |  0.471 us |  0.393 us |        - |        - |        - |         - |

Previous answer, still relevant for < .NET 5:

The latest versions of ConcurrentDictionary have improved significantly since I originally posted this answer. It no longer locks on read and thus offers almost the same performance profile as my CopyOnWriteDictionary implementation with more features so I recommend you use that instead in most cases. ConcurrentDictionary still has 20 - 30% more overhead than Dictionary or CopyOnWriteDictionary, so performance-sensitive applications may still benefit from its use.

You can read about my lock-free thread-safe copy-on-write dictionary implementation here:

http://www.singulink.com/CodeIndex/post/fastest-thread-safe-lock-free-dictionary

It's currently append-only (with the ability to replace values) as it is intended for use as a permanent cache. If you need removal then I suggest using ConcurrentDictionary since adding that into CopyOnWriteDictionary would eliminate all performance gains due to the added locking.

CopyOnWriteDictionary is very fast for quick bursts of writes and lookups usually run at almost standard Dictionary speed without locking. If you write occasionally and read often, this is the fastest option available.

My implementation provides maximum read performance by removing the need for any read locks under normal circumstances while updates aren't being made to the dictionary. The trade-off is that the dictionary needs to be copied and swapped after updates are applied (which is done on a background thread) but if you don't write often or you only write once during initialization then the trade-off is definitely worth it.

Note that the latest `ConcurrentDictionary` is lock-free for read operations. It does not have much overhead if you rarely write or update. https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentdictionary-2?view=netcore-3.0 — Mani Gandham, Aug 12 '19 at 12:51
@ManiGandham Did you benchmark it v.s. standard `Dictionary` or find benchmarks somewhere? — Mike Marynowski, Aug 14 '19 at 04:35
Rough benchmarking but it made no significant difference for us. You should probably benchmark it yourself but you can see the read path in the code: https://github.com/dotnet/corefx/blob/master/src/System.Collections.Concurrent/src/System/Collections/Concurrent/ConcurrentDictionary.cs — Mani Gandham, Aug 14 '19 at 15:16
@ManiGandham Impressive - very large improvement over the previous version. Just benchmarked it and it's essentially the same speed as mine for reads and single threaded writes, so there isn't really a reason to use mine anymore. Thanks for the info! — Mike Marynowski, Aug 14 '19 at 18:15
Update as of .NET Core 3.1 after some recent optimizations to my implementation: `CopyOnWriteDictionary` is 20 - 30% faster than `ConcurrentDictionary` when calling `TryGetValue` depending on how the test is done. — Mike Marynowski, Mar 26 '20 at 10:11

score 3 · Answer 4 · answered May 27 '15 at 04:54

ConcurrentDictionary vs. Dictionary

In general, use a System.Collections.Concurrent.ConcurrentDictionary in any scenario where you are adding and updating keys or values concurrently from multiple threads. In scenarios that involve frequent updates and relatively few reads, the ConcurrentDictionary generally offers modest benefits. In scenarios that involve many reads and many updates, the ConcurrentDictionary generally is significantly faster on computers that have any number of cores.

In scenarios that involve frequent updates, you can increase the degree of concurrency in the ConcurrentDictionary and then measure to see whether performance increases on computers that have more cores. If you change the concurrency level, avoid global operations as much as possible.

If you are only reading key or values, the Dictionary is faster because no synchronization is required if the dictionary is not being modified by any threads.

Link: https://msdn.microsoft.com/en-us/library/dd997373%28v=vs.110%29.aspx

score 2 · Answer 5 · answered Mar 06 '13 at 16:05

The ConcurrentDictionary<> creates an internal set of locking objects at creation (this is determined by the concurrencyLevel, amongst other factors) - this set of locking objects is used to control access to the internal bucket structures in a series of fine-grained locks.

In a single threaded scenario, there would be no need for the locks, so the extra overhead of acquiring and releasing these locks is probably the source of the difference you're seeing.

score 2 · Answer 6 · answered Apr 02 '20 at 09:24

There is no point in using ConcurrentDictionary in one thread or synchronizing access if all is done in a single thread. Of course dictionary will beat ConcrurrentDictionary.

Much depends on the usage pattern and number of threads. Here is a test, that shows that ConcurrentDictionary outperforms dictionary and lock with thread number increase.

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;

namespace ConsoleApp
{

    class Program
    {

        static void Main(string[] args)
        {
            Run(1, 100000, 10);
            Run(10, 100000, 10);
            Run(100, 100000, 10);
            Run(1000, 100000, 10);
            Console.ReadKey();
        }

        static void Run(int threads, int count, int cycles)
        {
            Console.WriteLine("");
            Console.WriteLine($"Threads: {threads}, items: {count}, cycles:{cycles}");

            var semaphore = new SemaphoreSlim(0, threads);

            var concurrentDictionary = new ConcurrentDictionary<int, string>();

            for (int i = 0; i < threads; i++)
            {
                Thread t = new Thread(() => Run(concurrentDictionary, count, cycles,  semaphore));
                t.Start();
            }

            Thread.Sleep(1000);

            var w = Stopwatch.StartNew();

            semaphore.Release(threads);

            for (int i = 0; i < threads; i++)
                semaphore.Wait();

            Console.WriteLine($"ConcurrentDictionary: {w.Elapsed}");

            var dictionary = new Dictionary<int, string>();
            for (int i = 0; i < threads; i++)
            {
                Thread t = new Thread(() => Run(dictionary, count, cycles, semaphore));
                t.Start();
            }

            Thread.Sleep(1000);

            w.Restart();

            semaphore.Release(threads);


            for (int i = 0; i < threads; i++)
                semaphore.Wait();

            Console.WriteLine($"Dictionary: {w.Elapsed}");

        }

        static void Run(ConcurrentDictionary<int, string> dic, int elements, int cycles, SemaphoreSlim semaphore)
        {
            semaphore.Wait();
            try
            {
                for (int i = 0; i < cycles; i++)
                    for (int j = 0; j < elements; j++)
                    {
                        var x = dic.GetOrAdd(i, x => x.ToString());
                    }
            }
            finally
            {
                semaphore.Release();
            }
        }

        static void Run(Dictionary<int, string> dic, int elements, int cycles, SemaphoreSlim semaphore)
        {
            semaphore.Wait();
            try
            {
                for (int i = 0; i < cycles; i++)
                    for (int j = 0; j < elements; j++)
                        lock (dic)
                        {
                            if (!dic.TryGetValue(i, out string value))
                                dic[i] = i.ToString();
                        }
            }
            finally
            {
                semaphore.Release();
            }
        }
    }
}

Threads: 1, items: 100000, cycles:10 ConcurrentDictionary: 00:00:00.0000499 Dictionary: 00:00:00.0000137

Threads: 10, items: 100000, cycles:10 ConcurrentDictionary: 00:00:00.0497413 Dictionary: 00:00:00.2638265

Threads: 100, items: 100000, cycles:10 ConcurrentDictionary: 00:00:00.2408781 Dictionary: 00:00:02.2257736

Threads: 1000, items: 100000, cycles:10 ConcurrentDictionary: 00:00:01.8196668 Dictionary: 00:00:25.5717232

score 0 · Answer 7 · answered Sep 16 '19 at 18:09

What makes ConcurrentDictionary<,> much slower in a single threaded environment?

The overhead of the machinery required to make it much faster in multi-threaded environments.

My first instinct is that lock(){} will be always slower. but apparently it is not.

A lock is very cheap when uncontested. You can lock a million times per second and your CPU won't even notice, provided that you are doing it from a single thread. What kills performance in multi-threaded programs is contention for locks. When multiple threads are competing fiercely for the same lock, almost all of them have to wait for the lucky one that holds the lock to release it. This is where the ConcurrentDictionary, with its granular locking implementation, shines. And the more concurrency you have (the more processors/cores), the more it shines.

score 0 · Answer 8 · answered Jun 09 '22 at 14:38

0

In .Net 4, ConcurrentDictionary utilized very poor locking management and contention resolution that made it extremely slow. Dictionary with custom locking and/or even TestAndSet usage to COW the whole dictionary was faster.

answered Jun 09 '22 at 14:38

Grwww

51
4

score -7 · Answer 9 · answered Jun 17 '15 at 14:16

Your test is wrong : you must stop the Stopwatch before !

        Stopwatch sw = new Stopwatch();      
        sw.Start();
        var d = new ConcurrentDictionary<int, int>();
        for (int i = 0; i < 1000000; i++) d[i] = 123;
        for (int i = 1000000; i < 2000000; i++) d[i] = 123;
        for (int i = 2000000; i < 3000000; i++) d[i] = 123;
        sw.Stop();
        Console.WriteLine("baseline = " + sw.Elapsed);



        sw.Start();
        var d2 = new Dictionary<int, int>();
        for (int i = 0; i < 1000000; i++) lock (d2) d2[i] = 123;
        for (int i = 1000000; i < 2000000; i++) lock (d2) d2[i] = 123;
        for (int i = 2000000; i < 3000000; i++) lock (d2) d2[i] = 123;
        sw.Stop();
        Console.WriteLine("baseline = " + sw.Elapsed);

        sw.Stop();

--Output :

From MSDN: "Use Stop to stop the current interval measurement and retain the cumulative elapsed time value. Use Restart to stop current interval measurement and start a new interval measurement." And you do not need to stop the watch to get the elapsed time, so his code is correct. — masimplo, Jun 17 '15 at 14:22

ConcurrentDictionary<> performance at a single thread misunderstanding?

9 Answers9