26

I need a fast replacement for the System.Collections.Generic.Dictionary<TKey, TValue>. My application should be really fast. So, the replacement should support:

  • Generics
  • Add
  • Get
  • Contains

... and that's it. I don't need any support in LINQ or anything. And it should be fast.

A simple code like:

Stopwatch stopWatch = Stopwatch.StartNew();

Dictionary<string, string> dictionary = new Dictionary<string, string>();
dictionary.Add("fieldName", "fieldValue");
dictionary.Add("Title", "fieldVaaaaaaaaaaaaaaaaalue");

Console.WriteLine(stopWatch.Elapsed);

... prints 00:00:00.0001274, which is alot of time for me, because my application is doing many other things, some of them from old slow libraries that I must to use and are not dependent on me.

Any ideas on how to implement a faster one?

Thank you.

Alon Gubkin
  • 56,458
  • 54
  • 195
  • 288

10 Answers10

78

Chances are you're seeing JIT compilation. On my box, I see:

00:00:00.0000360
00:00:00.0000060

when I run it twice in quick succession within the same process - and not in the debugger. (Make sure you're not running it in the debugger, or it's a pointless test.)

Now, measuring any time that tiny is generally a bad idea. You'd need to iterate millions of times to get a better idea of how long it's taking.

Do you have good reason to believe it's actually slowing down your code - or are you basing it all on your original timing?

I doubt that you'll find anything significantly faster than Dictionary<TKey, TValue> and I'd be very surprised to find that it's the bottleneck.

EDIT: I've just benchmarked adding a million elements to a Dictionary<TKey, TValue> where all the keys were existing objects (strings in an array), reusing the same value (as it's irrelevant) and specifying a capacity of a million on construction - and it took about 0.15s on my two-year-old laptop.

Is that really likely to be a bottleneck for you, given that you've already said you're using some "old slow libraries" elsewhere in your app? Bear in mind that the slower those other libraries are, the less impact an improved collection class will have. If the dictionary changes are only accounting for 1% of your overall application time, then even if we could provide an instantaneous dictionary, you'd only speed up your app by 1%.

As ever, get a profiler - it'll give you a much better idea of where your time is going.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • I'm basing it all on my orginal timing. – Alon Gubkin Dec 08 '09 at 20:08
  • 8
    Dictionary can perform very poorly with custom classes, or even more likely, custom structs, as the key if the hash code implementation is poor. – Reed Copsey Dec 08 '09 at 20:09
  • @Jon: I am executing same application in Visual Studio with Ctrl + F5. The lowest value I could get is ~ 00:00:00.0001552. Looks very big compare to yours one. Can you please bit elaborate in detail how to test. Thanks in advance. and sorry to bother you. – Saar Dec 08 '09 at 20:15
  • 1
    @Saar: perhaps his machine is faster, or he is running less stuff, or a kabazillion other possibilities. And the time to run 2 Adds will fluctuate a lot. You have to run a kabazillion Adds to get some stable measure. – R. Martinho Fernandes Dec 08 '09 at 20:18
  • @Saar: Run the same code twice in a row... what machine have you got, and what version of .NET are you using? – Jon Skeet Dec 08 '09 at 20:19
  • @Jon: I executed in row 5-6 times, by hitting Ctrl + F5. My machine is Quad Core 9550 vista .net framework 3.5 – Saar Dec 08 '09 at 20:24
  • 3
    @Saar: No, run it *in the same process* more than once, i.e. put it in a method and call it twice. Otherwise you'll be JITting it each time. – Jon Skeet Dec 08 '09 at 20:32
42

I agree with Jon Skeet's supposition that this is most likely JIT compilation.

That being said, I wanted to add some other information here:

Most of the speed issues relating to using Dictionary<T,U> are not related to the implementation of Dictionary. Dictionary<T,U> is VERY fast, out of the box. It would be difficult to beat it.

Speed issues relating to Dictionary instances are almost always actually hash code implementation issues. If you're having speed issues when using Dictionary<MyCustomClass,MyValue>, revisit the GetHashCode() implementation you have defined on MyCustomClass. This is even more critical if you're using a custom struct as your key.

In order to get good performance out of Dictionary, GetHashCode() should be:

  1. Fast
  2. Able to provide hash codes that generate few conflicts. Unique instances should, when possible, generate unique hash values.

If you get that right, I think you'll be very happy with the default Dictionary implementation.

Community
  • 1
  • 1
Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
  • 5
    If you cannot have unique hash code values then the performance of your Equals method in your key class is also important – sweetfa Aug 16 '13 at 07:53
9

Don't forget, you're timing the Dictionary constructor in that code as well. I did a test, moving the call to the constructor out of the measurement, and looped 10 times. Here's my test code:

for (int i = 0; i < 10; i++)
{
    Dictionary<string, string> test = new Dictionary<string, string>();

    System.Diagnostics.Stopwatch watch = System.Diagnostics.Stopwatch.StartNew();

    test.Add("fieldName", "fieldValue");
    test.Add("Title", "fieldavlkajlkdjflkjalkjslkdjfiajwelkrjelrkjavoijl");

    Console.WriteLine(watch.Elapsed);
}

Console.ReadKey();

Below are the results:

00:00:00.0000607
00:00:00.0000025
00:00:00.0000015
00:00:00.0000015
00:00:00.0000016
00:00:00.0000017
00:00:00.0000016
00:00:00.0000016
00:00:00.0000016
00:00:00.0000015

I'm not sure how much faster you could get than that...

Update

Looks like this mirrors Jon Skeets results too...JIT.

Justin Niessner
  • 242,243
  • 40
  • 408
  • 536
6

USE INTS AS KEYS FOR MAXIMUM PERFORMANCE:

For anyone who came here from Google, if you want to squeeze every last bit of performance out of a Dictionary, then use Ints as keys. Here's a benchmark comparing Int vs String Keys: https://jacksondunstan.com/articles/2527

The author of the article even mentions that converting strings to ints is worthwhile if you have such a need.

Also, note that this same behavior occurs in some other languages like PHP. Php associative arrays -are in fact- dictionaries, and if you use Ints in ascending order in PHP7, they outperform string keys tremendously.

JamesHoux
  • 2,999
  • 3
  • 32
  • 50
5

If you really need better performance, you're going to have to give up something major - like generics, dynamic memory allocation, etc. All those features sacrifice some performance.

I would avoid using Contains if at all possible and look at TryGetValue etc.

Cade Roux
  • 88,164
  • 40
  • 182
  • 265
3

Odds are you are not going to find anything much faster than Dictionary. I would just use Dictionary. Then, when you see you are not meeting your perf goals, and a profiler indicates that adding/removing from Dictionary are your bottlenecks you can consider replacing with a more targeted class.

Note that features such as LINQ due not incur any performance loss if you do not use them.

Michael
  • 54,279
  • 5
  • 125
  • 144
3

Dictionaries allow a specified IEqualityComparer comparer. for strings, or other types of A generic compare may not be the best performing. A little ILSpy will show you that it, if take the default == comparer, if your implementation suffers performance you can inject your own IEqualityComparer compairer. In the end the dictionary will compare hash code of what you provide as a key with the existing hash codes in it’s list of entries.

So if you have specific needs dictionary, perhaps specialize it in FastDictionary class getting to the hascode in a more efficient way,

In your implementation that would be:

var dictionary = new Dictionary<string, string>(StringComparer.Ordinal); 
Walter Verhoeven
  • 3,867
  • 27
  • 36
2

How many items do you plan to add to the dictionary? While Dictionary/Hashtable is usually the fastest, depending on what you are doing, there may be something faster (aka better suited) than a Hashtable (the underlying structure in a Dictionary). Based on the usage, it's possible that SortedList could be faster if combine with some kind of Skip List or even a self-balancing tree or tries. Especially if you wish to return a range of values rather than a single value.

A Hashtable is a good fit when:

  1. You know how many items you intend to store before population of the table begins. Dynamic resizing will be very painful!
  2. You have a good hash algorithm with even distribution, which .NET does
  3. There is a good mechanism in place for collision resolution, which .NET does
  4. You are looking for a single value
  5. You can guarantee that all values will be unique

If you're doing some compression, for example, a RB-Tree is better than a Hashtable.

Source: http://en.wikipedia.org/wiki/Hashtable#Dynamic_resizing

Nate Zaugg
  • 4,202
  • 2
  • 36
  • 53
2

Beside all of the above said please to note following as well:

  1. You could {Pre} initialize hash bucket array in Dictionary object by passing the initial size in Parentheses (in brackets) to constructor. E.g. 301?
Dictionary<string, string> dictionary = new Dictionary<string, string>( 301 );
  1. Depending on what you need to be faster add or get, you may also find important to focus on optimization towards Add/Remove or just Retrieve. Meaning that, sometimes it is required to locate and retrieve faster rather than add or remove them. In your case you mentioned in example dictionary.Add method, but the question was as well asked for faster replacement in general for whole class Dictionary<TKey, TValue> So i assume, you are interested not only in add method, but as well the get method to be faster. In that case next bullet may be considered as a faster solution in specific patterns of Key data.

  2. Faster then Dictionary and SortedList(int) can only be just pure Static/Dynamic Generic type of Array Array<String>... but it is a trade-off of BIG O(N): time / space.

Explaining: a.1) Dictionary can get values in O(1) (if there are no many collision of hash values!) a.2) Dictionary add is sometimes O(1) and sometimes O(n). so if you add one item after another, then roughly for every next element index equal to next prime number you would receive a time complexity of O(n) which is bigger just 0(1). Source: Understanding Generic Dictionary in-depth

b.1) Array Element is accessed simply by int index value in pre-allocated memory segment... Array[Index] ( Time Complexity = O(1) ). hence it is always faster than following operations in case of dictionary: LoopSearchInEntryListTargetElement(TransformToBucketArrayIndex(GetHashCode()))

Entry List may be iterated from 1 to 100 cycles in case of collisions.

b.2) Setting a Value to Array is as well just an int type value assignment operations in memory ( Time Complexity O(1) ). in case of Dictionary this would sometimes require resize and/or reorganize.

In your case: if you know that all distinct values of key string are not more then some uint.MaxValue (Unsigned 32-bit integer) (in 32 bit environment) and Maximum length of String of Any Key is NOT MORE then 4 (assuming that charset is from char(0) to char(255) ) --> You could easily transform any of that type of String to corresponding int value (used as an Index in our Array<string>) for Writing or Reading a String value fastest way possible.

That would always be O(1) time complexity for both Getting and/or Assigning a Value in Array. (Contains(TKey) could be written as TKeyValueArray[index] != NULL !Note: if TValues can be null as well in your scenario, then create a custom class or generic type of structure similar to KeyValuePair but with additional boolean field - Flag Set or NotSet)

Rough Example (hint): take byte code and do simple math for each char byte code from string index [0, 1, 2, 3]

(
      index =
          SomeKeyString [ 0 ] * 256 * 256 * 256
        + SomeKeyString [ 1 ] * 256 * 256
        + SomeKeyString [ 2 ] * 256
        + SomeKeyString [ 3 ] 
)

the formula and approach can be optimized per case (if strings have only Latin 1 Alphabet Characters then there is no need to use as much memory or you can have more lengthy TKey strings represented in your array). this is in case of desperate need of performance.

*Latin 1 Alphabet uses 191 characters ISO 8859-1 encodes what it refers to as "Latin alphabet no. 1", consisting of 191 characters from the Latin script... *

Sorry for just providing not thoroughly explained hints, I will try to provide more detailed answer in case of interest.

Also please read this Initial capacity of collection types, e.g. Dictionary, List

1

Could you use a List and define an enum such that, for example, fieldName = 0, Title = 1 and use each propery's unique index as a lookup index into the list? That would be the fastest solution, though the least flexible since you'd be tied to an enum.

Paul Sasik
  • 79,492
  • 20
  • 149
  • 189