I'm using an application which uses a number of large dictionaries ( up to 10^6 elements), the size of which is unknown in advance, (though I can guess in some cases). I'm wondering how the dictionary is implemented, i.e. how bad the effect is if I don't give an initial estimate of the dictionary size. Does it internally use a (self-growing) array in the way List does? in which case letting the dictionaries grow might leave a lot of large un-referenced arrays on the LOH.
-
2[An Extensive Examination of Data Structures Using C# 2.0](http://msdn.microsoft.com/en-US/library/ms379571%28v=vs.80%29.aspx) - The Queue, Stack, and Hashtable. – mihai Dec 16 '14 at 10:09
5 Answers
Using Reflector, I found the following: The Dictionary keeps the data in a struct array. It keeps a count on how many empty places are left in that array. When you add an item and no empty place is left, it increases the size of the internal array (see below) and copies the data from the old array to the new array.
So I would suggest you should use the constructor in which you set the initial size if you know there will be many entries.
EDIT: The logic is actually quite interesting: There is an internal class called HashHelpers
to find primes. To speed this up, it also has stored some primes in a static array from 3 up to 7199369 (some are missing; for the reason, see below). When you supply a capacity, it finds the next prime (same value or larger) from the array, and uses that as initial capacity. If you give it a larger number than in its array, it starts checking manually.
So if nothing is passed as capacity to the Dictionary, the starting capacity is three.
Once the capacity is exceeded, it multiplies the current capacity by two and then finds the next larger prime using the helper class. That is why in the array not every prime is needed, since primes "too close together" aren't really needed.
So if we pass no initial value, we would get (I checked the internal array):
- 3
- 7
- 17
- 37
- 71
- 163
- 353
- 761
- 1597
- 3371
- 7013
- 14591
- 30293
- 62851
- 130363
- 270371
- 560689
- 1162687
- 2411033
- 4999559
Once we pass this size, the next step falls outside the internal array, and it will manually search for larger primes. This will be quite slow. You could initialize with 7199369 (the largest value in the array), or consider if having more than about 5 million entries in a Dictionary might mean that you should reconsider your design.

- 17,233
- 9
- 65
- 88
-
2
-
-
@Albin Depending on the actual number, between 16 and 20 rehashings. @Rangoric It starts at 3. – Daniel Rose Aug 19 '10 at 14:35
-
Great answer, but how we choose index in internal array where place our value ? Do we use function GetHasCode and use it value devided on intranal array size to find the index for our value? – Dzmitry Oct 01 '11 at 13:55
-
@Dzmitry It's a bit more complicated. There is a buckets list, entries list, and free entries list. See http://www.simple-talk.com/community/blogs/simonc/archive/2011/09/16/103362.aspx for details. – Daniel Rose Oct 03 '11 at 19:12
-
3What's the significance or advantage of always using a prime when increasing capacity? – mike01010 Feb 13 '12 at 01:07
-
@mike That is due to the way hash tables are used. See http://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus – Daniel Rose Feb 13 '12 at 11:44
-
Many thanks, you answer led me to checking the source code for constructor, and is a bit scary, because when you provide capacity in advance, it is also sized up to prime number. Meaning that even for storing known collections there can be a lot of wasted memory. – greenoldman Feb 26 '22 at 18:52
MSDN says: "Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table." and further on "the capacity is automatically increased as required by reallocating the internal array."
But you get less reallocations if you give an initial estimate. If you have all items from the beginning the LINQ method ToDictionary might be handy.

- 46,430
- 8
- 69
- 108
-
1`ToDictionary` doesn't preallocate the dictionary at all - all it does is add elements one at a time until it's finished. If you know (or can guess) the size of the dictionary ahead of time, you might be better off creating a dictionary yourself and iteratively adding them yourself. – Zac Faragher Aug 23 '17 at 06:15
Hashtables normally have something called a load factor, that will increase the backing bucket store if this threshold is reached. IIRC the default is something like 0.72. If you had perfect hashing, this can be increased to 1.0.
Also when the hashtable needs more buckets, the entire collection has to be rehashed.

- 115,091
- 17
- 196
- 297
The best way for me would be to use the .NET Reflector.
http://www.red-gate.com/products/reflector/
Use the disassembled code to see the implementation.

- 3,330
- 14
- 12
- JSON as dictionary
{
"Details":
{
"ApiKey": 50125
}
}
- Model should contain property as type Dictionary.
public Dictionary<string, string> Details{ get; set; }
- Implement foreach() block with datatype as "KeyValue"
foreach (KeyValuePair<string, string> dict in Details)
{
switch (dict.Key)
{
case nameof(settings.ApiKey):
int.TryParse(kv.Value, out int ApiKey);
settings.ApiKey=ApiKey;
break;
default:
break;
}
}

- 578
- 1
- 7
- 16