5

I am trying to compare stl map and stl unordered_map for certain operations. I looked on the net and it only increases my doubts regarding which one is better as a whole. So I would like to compare the two on the basis of the operation they perform.

Which one performs faster in

Insert, Delete, Look-up

Which one takes less memory and less time to clear it from the memory. Any explanations are heartily welcomed !!!

Thanks in advance

Anurag Sharma
  • 4,839
  • 13
  • 59
  • 101

4 Answers4

13

Which one performs faster in Insert, Delete, Look-up? Which one takes less memory and less time to clear it from the memory. Any explanations are heartily welcomed !!!

For a specific use, you should try both with your actual data and usage patterns and see which is actually faster... there are enough factors that it's dangerous to assume either will always "win".

implementation and characteristics of unordered maps / hash tables

Academically - as the number of elements increases towards infinity, those operations on an std::unordered_map (which is the C++ library offering for what Computing Science terms a "hash map" or "hash table") will tend to continue to take the same amount of time O(1) (ignoring memory limits/caching etc.), whereas with a std::map (a balanced binary tree) each time the number of elements doubles it will typically need to do an extra comparison operation, so it gets gradually slower O(log2n).

std::unordered_map implementations necessarily use open hashing: the fundamental expectation is that there'll be a contiguous array of "buckets", each logically a container of any values hashing thereto.

It generally serves to picture the hash table as a vector<list<pair<key,value>>> where getting from the vector elements to a value involves at least one pointer dereference as you follow the list-head-pointer stored in the bucket to the initial list node; the insert/find/delete operations' performance depends on the size of the list, which on average equals the unordered_map's load_factor.

If the max_load_factor is lowered (the default is 1.0), then there will be less collisions but more reallocation/rehashing during insertion and more wasted memory (which can hurt performance through increased cache misses).

The memory usage for this most-obvious of unordered_map implementations involves both the contiguous array of bucket_count() list-head-iterator/pointer-sized buckets and one doubly-linked list node per key/value pair. Typically, bucket_count() + 2 * size() extra pointers of overhead, adjusted for any rounding-up of dynamic memory allocation request sizes the implementation might do. For example, if you ask for 100 bytes you might get 128 or 256 or 512. An implementation's dynamic memory routines might use some memory for tracking the allocated/available regions too.

Still, the C++ Standard leaves room for real-world implementations to make some of their own performance/memory-usage decisions. They could, for example, keep the old contiguous array of buckets around for a while after allocating a new larger array, so rehashing values into the latter can be done gradually to reduce the worst-case performance at the cost of average-case performance as both arrays are consulted during operations.

implementation and characteristics of maps / balanced binary trees

A map is a binary tree, and can be expected to employ pointers linking distinct heap memory regions returned by different calls to new. As well as the key/value data, each node in the tree will need parent, left, and right pointers (see wikipedia's binary tree article if lost).

comparison

So, both unordered_map and map need to allocate nodes for key/value pairs with the former typically having two-pointer/iterator overhead for prev/next-node linkage, and the latter having three for parent/left/right. But, the unordered_map additionally has the single contiguous allocation for bucket_count() buckets (== size() / load_factor()).

For most purposes that's not a dramatic difference in memory usage, and the deallocation time difference for one extra region is unlikely to be noticeable.

another alternative

For those occasions when the container's populated up front then repeatedly searched without further inserts/erases, it can sometimes be fastest to use a sorted vector, searched using Standard algorithms binary_search, equal_range, lower_bound, upper_bound. This has the advantage of a single contiguous memory allocation, which is much more cache friendly. It always outperforms map, but unordered_map may still be faster - measure if you care.

Community
  • 1
  • 1
Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
  • Don't see why that this: The fundamental expectation is that there'll be a contiguous array of key/value "buckets" would be an expectation. Or how that helps as I would always expect that the bucket storage area is larger than any internal cache and because traversing the elements in a given order is likely to give you completely random buckets cache locality is unlikely to be a factor. – Martin York Sep 13 '12 at 16:36
  • @LokiAstari: "Don't see why [...] would be an expectation" I mean that whenever I hear "we're using a hash table" I first picture a contiguous array sparsely populated with (key/value) pairs - something like that is my "fundamental expectation" and a useful point of departure as implementation details are added, e.g. buckets contain linked lists of key/values, free lists or locks stored in the buckets etc., multiple contiguous arrays to speed capacity extension. – Tony Delroy Sep 14 '12 at 06:00
3

The reason there is both is that neither is better as a whole.

Use either one. Switch if the other proves better for your usage.

  • std::map provides better space for worse time.
  • std::unordered_map provides better time for worse space.
Martin York
  • 257,169
  • 86
  • 333
  • 562
Drew Dormann
  • 59,987
  • 13
  • 123
  • 180
  • @LokiAstari I don't think this tradeoff is guaranteed by the standard. Isn't it conceivable that an implementation could provide an `unordered_map` with better memory usage than `map`? – Andrew Durward Sep 13 '12 at 16:20
  • @AndrewDurward: Yes there is always the possibility. But in the general case you are swapping space for time. – Martin York Sep 13 '12 at 16:38
1

The answer to your question is heavily dependent on the particular STL implementation you're using. Really, you should look at your STL implementation's documentation – it'll likely have a good amount of information on performance.

In general, though, according to cppreference.com, maps are usually implemented as red-black trees and support operations with time complexity O(log n), while unordered_maps usually support constant-time operations. cppreference.com offers little insight into memory usage; however, another StackOverflow answer suggests maps will generally use less memory than unordered_maps.

For the STL implementation Microsoft packages with Visual Studio 2012, it looks like map supports these operations in amortized O(log n) time, and unordered_map supports them in amortized constant time. However, the documentation says nothing explicit about memory footprint.

Community
  • 1
  • 1
Benjamin Barenblat
  • 1,311
  • 6
  • 19
  • 2
    "The answer to your question is heavily dependent on the particular STL implementation you're using" No it isn't. The complexity guarantees are *required* by the standard. They're not optional. – Nicol Bolas Sep 12 '12 at 01:59
  • 1
    @NicolBolas : The complexity guarantees are only half of it -- the OP also asked about memory usage. – ildjarn Sep 12 '12 at 02:08
  • 4
    The complexity guarantees are less than half of it. Constant factors matter in practice. And the standard says basically nothing about the performance of small sets and maps. Performance is complicated. – Jason Orendorff Sep 12 '12 at 02:10
  • @JasonOrendorff: The performance of small sets/maps is unlikely to be a bottleneck. The performance of large sets/maps are. Thus big O() notation is useful for that. This makes constant factors less important (so less than half). The complexity guarantees more than half. Otherwise if they are as important as you suggest the standard would have mentioned them. – Martin York Sep 13 '12 at 16:41
  • @LokiAstari An application may use many small sets and maps, and then their performance may matter. Certainly in Python, it turns out that the vast majority of dicts in real programs are quite small; so much so that the dict code contains special performance hacks for small tables. – Jason Orendorff Sep 14 '12 at 03:26
  • @LokiAstari Constant factors are certainly important. “X and Y are both O(n log n), but X is thirty times faster than Y” would be an example of a constant factor. The standard doesn’t specify particular constant factors for several reasons; one is that those constant factors depend on details of hardware and OS performance that are well beyond the C++ implementor’s control. – Jason Orendorff Sep 14 '12 at 03:48
1

Map:

Insertion:

  1. For the first version ( insert(x) ), logarithmic.
  2. For the second version ( insert(position,x) ), logarithmic in general, but amortized constant if x is inserted right after the element pointed by position.
  3. For the third version ( insert (first,last) ), Nlog(size+N) in general (where N is the distance between first and last, and size the size of the container before the insertion), but linear if the elements between first and last are already sorted according to the same ordering criterion used by the container.

Deletion:

  1. For the first version ( erase(position) ), amortized constant.
  2. For the second version ( erase(x) ), logarithmic in container size.
  3. For the last version ( erase(first,last) ), logarithmic in container size plus linear in the distance between first and last.

Lookup:

  1. Logarithmic in size.

Unordered map:

Insertion:

  1. Single element insertions:
    1. Average case: constant.
    2. Worst case: linear in container size.
  2. Multiple elements insertion:
    1. Average case: linear in the number of elements inserted.
    2. Worst case: N*(size+1): number of elements inserted times the container size plus one.

Deletion:

  1. Average case: Linear in the number of elements removed ( constant when you remove just one element )
  2. Worst case: Linear in the container size.

Lookup:

  1. Average case: constant.
  2. Worst case: linear in container size.

Knowing these, you can decide which container to use according to the type of the implementation.

Source: www.cplusplus.com

Rontogiannis Aristofanis
  • 8,883
  • 8
  • 41
  • 58