112

Why do I keep seeing different runtime complexities for these functions on a hash table?

On wiki, search and delete are O(n) (I thought the point of hash tables was to have constant lookup so what's the point if search is O(n)).

In some course notes from a while ago, I see a wide range of complexities depending on certain details including one with all O(1). Why would any other implementation be used if I can get all O(1)?

If I'm using standard hash tables in a language like C++ or Java, what can I expect the time complexity to be?

user1136342
  • 4,731
  • 10
  • 30
  • 40
  • a perfect has is O(1) lookup, but for that you have to know what the data will be when you design the table. – Mooing Duck Feb 09 '12 at 16:15
  • 3
    O(n) is worst case, O(1) is average case. In the worst case, you could be inserting N elements all of which hash to the same bucket. Then, for this data set, deletion and search will also be O(n). – Larry Watanabe Feb 09 '12 at 16:25
  • related: ["Time complexity of Hash table"](http://stackoverflow.com/questions/3949217/time-complexity-of-hash-table) – David Cary May 24 '15 at 13:32

5 Answers5

209

Hash tables are O(1) average and amortized case complexity, however it suffers from O(n) worst case time complexity. [And I think this is where your confusion is]

Hash tables suffer from O(n) worst time complexity due to two reasons:

  1. If too many elements were hashed into the same key: looking inside this key may take O(n) time.
  2. Once a hash table has passed its load balance - it has to rehash [create a new bigger table, and re-insert each element to the table].

However, it is said to be O(1) average and amortized case because:

  1. It is very rare that many items will be hashed to the same key [if you chose a good hash function and you don't have too big load balance.
  2. The rehash operation, which is O(n), can at most happen after n/2 ops, which are all assumed O(1): Thus when you sum the average time per op, you get : (n*O(1) + O(n)) / n) = O(1)

Note because of the rehashing issue - a realtime applications and applications that need low latency - should not use a hash table as their data structure.

EDIT: Annother issue with hash tables: cache
Another issue where you might see a performance loss in large hash tables is due to cache performance. Hash Tables suffer from bad cache performance, and thus for large collection - the access time might take longer, since you need to reload the relevant part of the table from the memory back into the cache.

Ben Hoyt
  • 10,694
  • 5
  • 60
  • 84
amit
  • 175,853
  • 27
  • 231
  • 333
  • Thanks- I think I understand. So if I was asked during an exam or an interview to come up with a data structure that performs lookup in O(1), do you know if including a hash table would be fine? – user1136342 Feb 09 '12 at 16:24
  • 1
    @user1136342: It depends if you need worst case or average case. For average case, hash tables are `O(1)`. If you need worst case - hash table will not be enough. – amit Feb 09 '12 at 16:29
  • 2
    Wikipedia says the worst case [can be reduced](https://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_other_structures) from `O(n)` to `O(log n)` by using a more complex data structure within each bucket. (I guess this could be considered overkill if the hashtable is already using a good cryptographic hash, which would prevent collisions even from an attacker.) – joeytwiddle May 05 '20 at 12:24
  • @joeytwiddle a sorted array as a secondary data structure is not hard to do and then you can indeed guarantee O(log(n)) worst case for lookups. There are other hash tables that can guarantee O(log(n)) worst case for lookups like [hash ordering](https://1ykos.github.io/patchmap) and by using a perfect hash table of size n² as a secondary data structure you can even guarantee O(1) worst case lookups. – Wolfgang Brehm Jul 02 '20 at 11:09
  • Sorry for the not all, but some hash tables have tighter worst case bounds on lookups than O(n). Maybe you could write: "Most hash table implementations suffer from O(n) worst time complexity due to two reasons:" – Wolfgang Brehm Jul 02 '20 at 11:11
  • I have a relevant question. When we say O(1), doesn't it already mean the worst case for the scenario? If we are going to say an average time complexity for a case, shouldn't we say Θ(1)? – codexplorer Aug 16 '21 at 05:57
  • 1
    @codexplorer No. Theta/big O/... are all about the bound - and unrelated to how you analyze your algorithm. I tried to explain about it a bit in [this thread](https://stackoverflow.com/a/12338937/572670). – amit Aug 16 '21 at 06:01
27

Ideally, a hashtable is O(1). The problem is if two keys are not equal, however they result in the same hash.

For example, imagine the strings "it was the best of times it was the worst of times" and "Green Eggs and Ham" both resulted in a hash value of 123.

When the first string is inserted, it's put in bucket 123. When the second string is inserted, it would see that a value already exists for bucket 123. It would then compare the new value to the existing value, and see they are not equal. In this case, an array or linked list is created for that key. At this point, retrieving this value becomes O(n) as the hashtable needs to iterate through each value in that bucket to find the desired one.

For this reason, when using a hash table, it's important to use a key with a really good hash function that's both fast and doesn't often result in duplicate values for different objects.

Make sense?

Mike Christensen
  • 88,082
  • 50
  • 208
  • 326
  • 1
    `as the hashtable needs to iterate through each value in that bucket` But the bucket doesn't contain `n` items, just those that hashed to that particular key? – SamAko Sep 04 '18 at 22:56
  • 1
    Note: Instead of a linked list, a balanced tree can be used to achieve lg(n) retrieval as is done in Java 8+. – EntangledLoops Feb 09 '19 at 21:02
  • 2
    @T.Rex: in the worst case scenario, the bucket would have `n` items – jose Nov 21 '19 at 13:00
11

Some hash tables (cuckoo hashing) have guaranteed O(1) lookup

Mike Christensen
  • 88,082
  • 50
  • 208
  • 326
Demi
  • 3,535
  • 5
  • 29
  • 45
8

Perhaps you were looking at the space complexity? That is O(n). The other complexities are as expected on the hash table entry. The search complexity approaches O(1) as the number of buckets increases. If at the worst case you have only one bucket in the hash table, then the search complexity is O(n).

Edit in response to comment I don't think it is correct to say O(1) is the average case. It really is (as the wikipedia page says) O(1+n/k) where K is the hash table size. If K is large enough, then the result is effectively O(1). But suppose K is 10 and N is 100. In that case each bucket will have on average 10 entries, so the search time is definitely not O(1); it is a linear search through up to 10 entries.

Mark Wilkins
  • 40,729
  • 5
  • 57
  • 110
  • Oh- I was just looking at worst case. So to be clear, when people say O(1) they just mean average case? – user1136342 Feb 09 '12 at 16:13
  • @user1136342: Edited the answer to try to clarify this. – Mark Wilkins Feb 09 '12 at 16:22
  • 2
    Usually the [load balance](http://en.wikipedia.org/wiki/Load_balancing_%28computing%29) for hash tables is `table_size/8 <= #elements <= table_size/2`, so it comes back to `O(1)`. Though, if the size of the table is dynamic - there is still the rehashing issue, which makes a worst case of `O(n)` as well. look at my answer for details and explanation. – amit Feb 09 '12 at 16:28
2

Depends on the how you implement hashing, in the worst case it can go to O(n), in best case it is 0(1) (generally you can achieve if your DS is not that big easily)

jmj
  • 237,923
  • 42
  • 401
  • 438