4

C++ by default provides a tree based map. With Boost you can get a hashmap.

What are the advantages and disadvantages of

  1. C++'s Tree Based Map and

  2. Boost's Hashmap ?

jmasterx
  • 52,639
  • 96
  • 311
  • 557
  • possible duplicate of [Binary Trees vs. Linked Lists vs. Hash Tables](http://stackoverflow.com/questions/371136/binary-trees-vs-linked-lists-vs-hash-tables) – Nemo Jul 07 '11 at 19:12
  • Tree = O(log(N)), hash = O(k) and O(1) in respect to N. – Damon Jul 07 '11 at 19:14
  • @Nemo: I don't think this is a duplicate because the question asks about the differences between a tree based map and boost's hashmap. –  Jul 07 '11 at 19:18
  • @0A0D: The question may not be exactly the same, but the answers there definitely answer the question here. – Nemo Jul 07 '11 at 21:11
  • @Nemo: That may be true but that's not how duplicate questions are defined on SO. –  Jul 07 '11 at 21:13

2 Answers2

6

C++0x/TR1 also provides the unordered_map which is usually implemented as a hash map.

The differences are twofold:

  1. The key type. In the ordered map, the key type must obey a strict weak ordering, and entries are maintained in that order. In the unordered map, the key type must be equality-comparable and you must provide a hash function h such that h(Key) returns size_t [thanks to Steve Jessop for the clarification].

  2. Access complexity: Insert/delete/find in an ordered map is O(log n) in the map size n. In the unordered map, it is "usually" O(1), but worst-case behaviour is O(n) (e.g. if all keys map to the same hash value).

So the ordered map provides a total complexity guarantee, while the unordered map provides a (better) complexity in good cases, depending on the quality of your hash function.

The internal implementation complexity of the unordered map is greater than of the ordered map, but you can imagine that you get the better access complexity because you get fewer features, i.e. you don't get sorting for free. It's a classical trade-off.

Another point: Practically, if the weak ordering operator is expensive to compute, like for strings, the unordered map may actually be quite a bit faster, because comparisons on the hash type are very fast. On the other hand, if your key type is one with trivial hash function (like any built-in integral type) and if you don't need the ordering, consider using an unordered container.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • @Kerrick: Do you mean `operator size_t(const Key &)`? And can you cite the section of TR1 (or C++0x) that says all you have to do is provide an `operator size_t`? (Not saying you are wrong; just saying I do not see it. I thought you had to specialize `hash` to use your own Key type.) – Nemo Jul 07 '11 at 21:19
  • No no, not `operator size_t`, just any function of signature `size_t (const Key &)`. Like `size_t hash_my_int(const int & x) { return x; }`. You have to specify the function in the container, `unordered_set`, or specialize `std::hash`. – Kerrek SB Jul 07 '11 at 21:20
  • @Kerrik: OK. You might want to make that clear in your answer, since `size_t(const Key &)` is not a well-formed expression :-). – Nemo Jul 07 '11 at 21:25
  • @Nemo: Well, it's just supposed to be a signature. Maybe `size_t()(const Key &)`, eh? But `std::function` works... – Kerrek SB Jul 07 '11 at 21:26
  • The requirement on hashes is expressed in terms of the necessary expression rather than a signature. If `h` is the hash and `k` is a key, then `h(k)` must return `size_t`, but it's up to you how it gets there. It's legal for it to take the parameter by value rather than by const reference, for example, or it could take a completely different type that `k` converts to. – Steve Jessop Jul 07 '11 at 22:09
1

Hash tables provide very fast search access and insertion/deletions of objects ... the complexity for such operations is on average O(1), meaning constant time. The main limitation for these two operations is the speed of the hashing algorithm (for some types of objects that are not POD's, these can be a bit complex and take up more time for good ones that avoid "collisions" where two different objects hash to the same value). The main penalty for a hash table is that it requires a lot of extra space.

Binary trees on the other-hand have relatively quick insertion and search times, and the complexity for deleting and object is the same as insertions. Because of the way a binary tree works, where each node has two more child-nodes, the search and access time (as well as insertions and deletions), takes O(log N) time. So binaty trees are "slower" than hash tables, but are not as complex to implement (although balanced binary search trees are more complex that unbalanced trees).

Another side-benefit of a binary search tree is that you can, by iterating though the container from the "first" element to the "last" element, get a sorted list of objects, where-as with the hash-map, that list would not be sorted. So the extra time for insertions also takes into account the face that the binary search tree is a sorted insertion. For instance, the complexity of a quicksort on a group of N items is the same complexity as building a balanced binary search tree (i.e., a red/black tree) for the same group of N items. Both operations are O(N log N).

Jason
  • 31,834
  • 7
  • 59
  • 78