6

As I know, std::unordered_map is used for fast access to elements. This achieves by storing and comparing the key hash instead of the key itself. Also, unordered means that the elements in it are not sorted. But fast access to elements requires that the items have sorted to be able to find the requested item using binary search.

  • Does it mean the items in an unordered_map are sorted based on their hash key and the only reason that causes an unordered_map to be faster than map to access elements is comparing hash values is usually much more faster than comparing key values?
  • If so, choosing between unordered_map and map depends on the type of key. Am I right?
  • And the last question is why an unordered_map does not get a Compare template parameter like what the map does? How does an unordered_map compare the key hashes just by an equal operator?

    template <class Key,
              class T,
              class Compare = less<Key>,
              class Alloc = allocator<pair<const Key,T> >
              > class map;
    
    template <class Key,
              class T,
              class Hash = hash<Key>,
              class Pred = equal_to<Key>,
              class Alloc = allocator< pair<const Key,T> >
              > class unordered_map;
    
wxShayan
  • 288
  • 1
  • 4
  • 12
  • Keys are not sorted by hash in a hash map. The hash is subject to a modulus and that result is used as a random access index into an array of buckets. – Richard Hodges Apr 28 '18 at 06:01
  • You might find my answer [here](https://stackoverflow.com/a/30567466/410767) helpful - it illustrates how entries get tracked in a hash table (though a good `unordered_map` implementation may do things slightly differently, e.g. GCC buckets store iterators into a singly linked list of key/value pairs so you can iterate elements without inspecting empty buckets). – Tony Delroy Apr 29 '18 at 01:25

2 Answers2

7

Fast element access does require some form of ordering. Unordered_map is called that way because the ordering might not make sense to a human and might not remain stable when you add or remove elements.

unordered_map is not faster than map because comparing hashes one to one is faster than comparing arbitrary objects one to one. It is faster because it doesn't need comparisons at all. This is why it doesn't need a compare template parameter.

The typical unordered_map implementation is a hash table. A hash table is mostly a regular array of key-value pairs that uses a clever trick to help you find the element you're looking for quickly.

An ideal hash function is uniformly distributed: if you were to pick a hash from any object at random, the value of hash % N for some integer N should be roughly uniform (pretending for a second that modulo bias doesn't exist). If you choose N to be the size of your array of key-value pairs, you can use hash(key) % size as an array index for quick lookup.

Since the hash value is supposed to be uniformly-distributed, different objects will usually have a different index, so things will usually work in your favor by doing just that. However, it's still possible that hash(key) % N is the same thing for two objects. In this case, the hash table needs to handle a collision: there are multiple strategies, but all of them typically devolve to a linear search within the keys that fell in the same hash bucket (and for that reason, the hash table needs to contain the key too, not just the hash of the key). This is why the worst-case access time for a hash table is O(n), andd it highlights the importance of having a good hash function.

In some cases, this can be a reason to prefer map over unordered_map, as the access performance of map (O(log n)) is very predictable.

Also, as the number occupied buckets in a hash table increases, the chances of a collision also increases. In general, for that reason, hash tables will have more buckets than elements, meaning that it's "wasting" space for efficiency.

zneak
  • 134,922
  • 42
  • 253
  • 328
  • Thank you very much. it's an awesome answer. I was completely thinking wrong. – wxShayan Apr 28 '18 at 08:25
  • This answer's interesting but a bit confusing because it both states that unordered_map doesn't have a compare template parameter, and that it performs a comparison (it actually does have such a parameter, as can be seen in question). AFAIK it cannot know if a lookup ended up in a collision or not, so it always needs to perform the comparison to check anyway. Typical implementation seems to be buckets corresponding to hashes then traversal of the bucket with comparison of keys. – foxesque Jul 27 '23 at 00:05
1

How the containers are implemented is, I believe, left to the implementer, however the standard may specify time complexity requirements for certain operations.

In practice most implementations of unordered_map are hash tables. In hash tables the entries are not simply sorted, but rather divided into buckets. The ordered map might be implemented as a tree instead, as suggested by for example this website.

Which collection you choose isn't dependent on just the type of key. Both have their advantages and tradeoffs in features, memory use and efficiency in different cases. However I would say that in the general case unordered_map is the better choice if you absolutely don't need ordered keys, because not guaranteeing order gives the implementation more freedom to implement things efficiently (hash tables generally have O(1) lookup performance). Additionally in most other programming languages the default mapping type doesn't guarantee order, so it seems to be a common choice.

The hash type unordered_map uses is specified as size_t which is just an integer, so it can just use standard integer operations to compare and do calculations with hashes.

Matti Virkkunen
  • 63,558
  • 9
  • 127
  • 159