5

std::unordered_map guarantees O(1) time search, but how does it manage collision?

Cppreference claims

Unordered map is an associative container that contains key-value pairs with unique keys. Search, insertion, and removal of elements have average constant-time complexity.

Assuming a situation where all the Hash codes are same, how is the collision handled internally?

My assumption would be totally wrong if the hash code is unique to every key. In that case how is the unique hash code created where there are no collisions at all?

What approach does std::unordered_map's hash function take to guarantee O(1) search?

erip
  • 16,374
  • 11
  • 66
  • 121
user2256825
  • 594
  • 6
  • 22

3 Answers3

10

It doesn't guarantee O(1), it's O(1) on average... Worst case it can be O(n) when there are a lot of collisions. Please see link below, for more info:

https://stackoverflow.com/a/2771398/5874704

Update

Since the question has been edited, and now asks specifically about collisions for std::unordered_map, please have a look at the following answer:

https://stackoverflow.com/a/21519560/5874704

I think we can conclude that all practical implementations of std::unordered_set (or unordered_map) almost certainly use collision chaining. While it might be (just barely) possible to meet the requirements using linear probing or double hashing, such an implementation seems to lose a great deal and gain nearly nothing in return.

Community
  • 1
  • 1
A.Fagrell
  • 1,052
  • 8
  • 21
3

There was an omission from your post that is crucial to understand: std::unordered_map has average-case O(1) search. It can take up to O(n) in the number of elements in the map to retrieve the element.

As for which hash function it uses - this is up to the user. By default it uses std::hash.

The only requirement on the hashing function with respect to collision handling is

Hash functions are only required to produce the same result for the same input within a single execution of a program; this allows salted hashes that prevent collision DoS attacks. (cppreference)

erip
  • 16,374
  • 11
  • 66
  • 121
  • Please explain the downvote. I can't learn anything if there's no feedback. – erip May 04 '16 at 13:52
  • 2
    (Not my downvote) It's not amortized, it's average. "Amortized" implies a guarantee. – T.C. May 04 '16 at 19:23
  • @T.C. I guess I'd been operating under the wrong assumption for awhile. :) Edited. Thanks for the note! – erip May 04 '16 at 19:25
1

std::unordered_map guarantees O(1) time search, but how does it manage collision?

It uses open addressing / separate chaining, see here.

Cppreference claims

Unordered map is an associative container that contains key-value pairs with unique keys. Search, insertion, and removal of elements have average constant-time complexity.

Assuming a situation where all the Hash codes are same, how is the collision handled internally?

The colliding elements are added into another container holding all values that hashed to that bucket. That container is usually a linked list, but there's nothing stopping an implementation using e.g. a binary tree.

My assumption would be totally wrong if the hash code is unique to every key. In that case how is the unique hash code created where there are no collisions at all?

unordered_map isn't required or expected to do anything special to avoid collisions. (Hash codes being "unique to every key" doesn't suffice anyway, as collisions can be created when hash codes are masked or mod-ed into the number of buckets.)

What approach does std::unordered_map's hash function take to guarantee O(1) search?

This is the crux of your misunderstanding. unordered_map has O(1) performance when the hash function does an adequate job of hashing the keys across the buckets. It may degrade to O(n) if the hash function is poor, or has been deliberately targeted by a malicious input of keys known to hash to the same bucket. The Standard does not require implementations to prevent that, but users can supply a cryptographic hash, pick a hash function from a family at runtime, or otherwise make it impractical for a malicious user - or similar inputs generally - to create many more collisions.

Community
  • 1
  • 1
Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
  • if it uses a binary tree, it would nOt be any different than std::map(ordered) , as this uses binary tree , which guarantees o(log n) and i this is the major difference between ordered_map and unordered_map(Link List with amortized constant time and O( Log n ) ). correct me if iam wrong – user2256825 May 04 '16 at 14:54
  • @user2256825: correction coming up ;-) - unordered_map may use a list or binary tree for values *hashing to the same bucket* - that shouldn' t be confused with a map that *only* has a binary tree and not the hash table / buckets step. You might find my answer [here](http://stackoverflow.com/a/30567466/410767) useful as it illustrates the division between the hash table and the logical container (it illustrates a linked list) storing (possible multiple colliding) elements at each bucket. – Tony Delroy May 04 '16 at 15:08
  • Can you elaborate please? :) – user2256825 May 04 '16 at 15:12
  • @user2256825: was that request from before or after I mentioned my linked answer? – Tony Delroy May 04 '16 at 15:13
  • Before :) the linked answer – user2256825 May 04 '16 at 15:17
  • i understood how linked list is used when a collision happens , but my confusion was how unordered map may use both link list and binary test as per your comment above. If there is a collision either binary tree is used to store values or a linked list , how can both be used in unordered map. – user2256825 May 04 '16 at 15:23
  • 1
    @user2256825 it doesn't need to use both, and in practice all C++ implementations I've looked at use linked lists, but there's nothing in the Standard requiring that. FWIW, some Java implementations use trees. Re "as per [my] comment above" - what did I say that caused this confusion? I've reread what I've written above and couldn't spot anything.... – Tony Delroy May 04 '16 at 15:46
  • Probably I got confused with this. Unordered tree may use list or a binary tree for hashing values to same bucket .... And I still did not get this – user2256825 May 04 '16 at 17:17
  • I don't see how you can use a tree when you don't have a way of ordering the items. – T.C. May 04 '16 at 19:18
  • @As per my understanding , tree may be used at a specific index of an array where the collision might have taken place , and the index can be a mod of array size or might have some logic to find an index of an array and during collision either a binary tree or a linked list may be used .bur I dint really understand what is way of ordering .. Is that the hash function logic ? Can you please elaborate – user2256825 May 04 '16 at 19:24
  • Step 1 : find an index using a hash function Step:2 : create a tree at that particular and insert element(key,value) If collision happens after step 1 , new element is inserted in the same index's binary tree. What excatly is way of ordering here – user2256825 May 04 '16 at 19:29
  • @T.C. you could order by the hash value (but then need a "multimap" style tree accommodating same-hash entries), or use SFINAE to see if there's a member `operator<`, but that might be a bit dodgy given there's no requirement on the value type requiring the usual semantic meaning of `<`. – Tony Delroy May 05 '16 at 04:12
  • @user2256825: as I mention to T.C. above, and you suggest yourself above, the hash value could indeed be the basis of ordering within a tree. – Tony Delroy May 05 '16 at 04:15
  • Ah, OK, handle the case where different hashes are mapped to the same container, but not when different keys have same hash. – T.C. May 05 '16 at 05:02
  • @T.C. exactly; the former will usually be massively more common with any good hash function.... – Tony Delroy May 05 '16 at 05:50