0

std::unordered_map is implemented as a hash table. According to this:

In case of single element insertions, the Worst Case time complexity is O(linear in container size) and Average case is O(1)

Now I am using an std::unordered_map<int, int> to update the frequencies of elements in an input array.

My problem is that I don't understand how to determine when the average and worst cases will occur?

Azeem
  • 11,148
  • 4
  • 27
  • 40
Mohit Shah
  • 843
  • 1
  • 6
  • 20
  • 1
    Ask yourself, what happens when hashes collide? – NathanOliver Sep 24 '18 at 00:39
  • 2
    You determine that by determining the _quality of your hash function_. Your particular hash function is `std::hash`, which - unless your compiler is both obscure and intentionally hostile - is a Good Quality hash function. – Drew Dormann Sep 24 '18 at 01:11
  • [Understanding hashtable performance in the worst-case](https://cs.stackexchange.com/questions/57564/understanding-hashtable-performance-in-the-worst-case) – Bernhard Barker Sep 24 '18 at 08:43

1 Answers1

3

There are two cases when the worst case can occur. First, if your hash table is full, it has to be expanded, which includes rehashing of all elements. How to define when the hash table is full? There is a parameter called load factor which is defined as a ratio: number_of_elements / number_of_buckets. When the load factor exceeds max_load_factor, hash table is expanded. By default, unordered_map containers have a max_load_factor of 1.0. Therefore, if your insert triggers rehashing, it won't be O(1).

Second case depends on the implementation of your hash table's collision resolution technique. The most popular implementations are chaining, linear probing, double hashing.. Due to the certain requirements imposed by C++ standard, all practical implementations of std::unordered_map use chaining for collision resolution. In short, chaining means that all entries in the same bucket are organized as a linked list (or BST in some recent implementations), which means that adding a new element requires traversing the list. In theory, in case of a non-uniform hash function or by picking some pathological input, all entries can end up in the same bucket and the complexity of adding a new element could really become O(linear in container size). As others have already mentioned, std::hash<int> is a good hash function, so in practice you don't have to worry much about it.

Miljen Mikic
  • 14,765
  • 8
  • 58
  • 66