0

As I have read on wikipedia that hash tables have on average O(1) search time.

So lets say I have a very large dictionary that contains maybe tens of millions of records.

If I use Dicionary.ContainsKey to extract the value against a given key will it's lookup time be really 1 or would it be like log n or something else due to some different internal implementation by .NET.

Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
Win Coder
  • 6,628
  • 11
  • 54
  • 81
  • Still `O(1)`. if it would depend on the number of elements you would see an `n` in the formula :) – I4V Aug 24 '13 at 19:15

3 Answers3

2

Big Oh notation doesn't tell you how long something takes. It tells you how it scales.

Easiest one to envision is searching for an item in a List<>, it has O(n) complexity. If it takes, on average, 2 milliseconds to find an item in a list with a million elements then you can expect it to take 4 milliseconds if the list has two million elements. It scales linearly with the size of the list.

O(1) predicts constant time for finding an element in a dictionary. In other words, it doesn't depend on the size of the dictionary. If the dictionary is twice as big, it doesn't take twice as long to find the element, it takes (roughly) as much time. The "roughly" means that it actually does take a bit longer, it is amortized O(1).

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
1

It would still be close to O(1), because it would still not depend on the number of the entries, but on the numbers of the collisions you have. Indexing an array is still O(1), no matter how many items you have.

Also, there seems to be a top limit on size of Dictionary caused by the implementation: How is the c#/.net 3.5 dictionary implemented?

Once we pass this size, the next step falls outside the internal array, and it will manually search for larger primes. This will be quite slow. You could initialize with 7199369 (the largest value in the array), or consider if having more than about 5 million entries in a Dictionary might mean that you should reconsider your design.

Community
  • 1
  • 1
Nemanja Boric
  • 21,627
  • 6
  • 67
  • 91
  • I guess that's a problem then. My Dictionary would probably contain more than 7 million enteries. Any Alternates ? – Win Coder Aug 24 '13 at 19:28
  • @WinCoder Using multiple separated dictionaries? It would still be `O(1)` – Nemanja Boric Aug 24 '13 at 19:29
  • hmmm sorry not following you. how ? – Win Coder Aug 24 '13 at 19:31
  • Well, you can keep track of your dictionary. Once your dictionary is too big, you can add additional dictionary. Of course, all this needs to be nicely encapsulated so the user could use it just like a dictionary. If you want to check if some key exists, loop through both dictionaries. If you want to add, make sure there is no such key in all dictionaries, and then add to the last, etc. Just shooting out some ideas, perhaps this is not the best way to make this work. – Nemanja Boric Aug 24 '13 at 19:36
  • 1
    Well thanks this is all i need. – Win Coder Aug 24 '13 at 19:48
0

What is the key? If the key is Int32 then yes it will be close to order 1.

You only get less than order 1 if there are hash collisions.
Int32 as a key will have zero hash collisions but that does not guarantee zero hash bucket collisions.

Be careful of keys that produce hash collisions.
KVP and tuple can create a lot of hash collisions and are not good candidates for key.

paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • Even if every distinct key returned a different value for `GetHashCode` there may still be hash collisions, since the number of entries in a hash table will almost always be be a tiny fraction of the number of possible `GetHashCode` values. `GetHashCode` returns a 32-bit value not because it's expected that every distinct integer will be mapped to a different slot, but because it's easier to have `GetHashCode` always return a 32-bit integer than it would be to have it return values that were scaled according to hash table size. – supercat Aug 25 '13 at 03:52
  • @supercat You confuse hash bucket with hash collision. Int32 as a key produces zero hash collisions. Tiny fraction? If you feel dictionary is not O(1) then post that as an answer. – paparazzo Aug 25 '13 at 13:15
  • A typical hashing collection uses a munges the 32-bit value returned by `GetHashCode` to produce a value in a smaller range which is used to select a bucket. When using such a collection, if one were to add 1,000 items all of which mapped to the same bucket, even if their hash buckets were distinct, the collection would have to do some work on each of those 1,000 items (if it caches hash codes, it could merely compare the cached hashed code rather than compare the items, but that would still represent O(N) work). – supercat Aug 25 '13 at 15:13
  • @supercat Again. If you feel .NET HashTable of Dictionary is not O(1) then post it as an answer. The accepted answer states O(1) also. Why bury this in a comment. – paparazzo Aug 25 '13 at 15:22
  • The question is closed as duplicate, so I can't post an answer here. Also, my complaint was with your specific claim that an `int` key guarantees that there will be no collision-related slowdowns. – supercat Aug 25 '13 at 15:43
  • @supercat OK close to order 1 as stated in the in the documentation. – paparazzo Aug 25 '13 at 17:54
  • The *average* performance will be O(1) if collisions are random, but there are sequences of inputs for which it would be O(n) even if every key yields a different `GetHashCode`; if an untrusted source is supplying integer keys, it may be good for `GetHashCode` to scramble them using parameters that are randomly picked on startup (e.g. multiply by a randomly-picked odd number and xor with a randomly-picked bitmask) to prevent the deliberate injection of values which would generate many collisions. – supercat Aug 25 '13 at 18:23
  • @supercat I don't have a question. The question is closed and has an accepted answer. Please stop. – paparazzo Aug 25 '13 at 18:34