8

Consider for example the documentation for the .NET Framework 4.5 Dictionary<TKey, TValue> class:

In the remarks for the .ContainsKey method, they state that

This method approaches an O(1) operation.

And in the remarks for the .Count property, they state that

Retrieving the value of this property is an O(1) operation.

Note that I am not necessarily asking for the details of C#, .NET, Dictionary or what Big O notation is in general. I just found this distinction of "approaches" intriguing.

Is there any difference? If so, how significant can it potentially be? Should I pay attention to it?

Mogsdad
  • 44,709
  • 21
  • 151
  • 275
David S.
  • 5,965
  • 2
  • 40
  • 77
  • ContainsKey might search all items, count must count all items. – Marvin Smit Nov 14 '13 at 21:49
  • 2
    @MarvinSmit: `ContainsKey` doesn't have to _search_ and `Count` does not have to count. `ContainsKey` uses `GetHashCode` and `Count` is just a property that is already set. – Tim Schmelter Nov 14 '13 at 21:52
  • possible duplicate of [Constant Amortized Time](http://stackoverflow.com/questions/200384/constant-amortized-time) – Hans Passant Nov 14 '13 at 21:57
  • Maybe the documentation is wrong or misleading. Maybe it's something like `O(log n)` and since they don't figure `log n` [can/will grow](http://stackoverflow.com/questions/8893374/list-vs-dictionary-maximum-size-number-of-elements) beyond some relatively small constant, it's basically `O(1)`. – Tim S. Nov 14 '13 at 21:59
  • @TimS. No, the contains key check is not actually log(n). If it was, it'd say that. A SortedDictionary is in fact log(n) for ContainsKey. – Servy Nov 14 '13 at 22:01
  • @Servy `O(log(log(n))` then? Just a guess. =) In any case, the documentation is saying, "it's gonna be fast", so you probably don't need to worry about it. – Tim S. Nov 14 '13 at 22:02
  • 4
    @TimS. The reason that it's not technically O(1) is that you need to account for hash collisions. The average number of collisions per bucket is dependent on the particular hash function, and the items you put in it. In most cases it really is technically O(1), it's just that you can't *guarantee* that it's O(1), so they need to use the given wording. Even when it's not O(1), the actual function to describe the average number of collisions with the object is going to be complex, and may not even be dependent on the number of items in the set. – Servy Nov 14 '13 at 22:04

4 Answers4

10

If the hash function used by the underlying objects in the hash code is "good" it means collisions will be very rare. Odds are good that there will be only one item at a given hash bucket, maybe two, and almost never more. If you could say, for certain, that there will never be more than c items (where c is a constant) in a bucket, then the operation would be O(c) (which is O(1)). But that assurance can't be made. It's possible, that you just happen to have n different items that, unluckily enough, all collide, and all end up in the same bucket, and in that case, ContainsKey is O(n). It's also possible that the hash function isn't "good" and results in hash collisions frequently, which can make the actual contains check worse than just O(1).

Servy
  • 202,030
  • 26
  • 332
  • 449
4

It's because Dictionary is an implementation of hash tables - this means that a key lookup is done by using a hashing function that tells you which bucket among many buckets contained in the data structure contains the value you're looking up. Usually, for a good hashing function, assuming a large enough set of buckets, each bucket only contains a single element - in which case the complexity is indeed O(1) - unfortunately, this is not always true - the hashing function may have clashes, in which case a bucket may contain more than one entry and so the algorithm has to iterate through the bucket until it finds the entry you're looking for - so it's no longer O(1) for these (hopefully) rare cases.

ishaaq
  • 6,329
  • 3
  • 16
  • 28
0

O(1) is a constant running time, approaching O(1) is close to constant but not quite, but for most purposes is a negligible increase. You should not pay attention to it.

Konstantin
  • 3,254
  • 15
  • 20
0

Something either is or isn't O(1). I think what they're trying to say is that running time is approximately O(1) per operation for a large number of operations.

Daniel
  • 855
  • 10
  • 21