map vs unordered_map for few elements

Question

I am trying to choose between map and unordered_map for the following use case:

The key of the map is a pointer. The most common use case is that there will be a single element in the map. In general, the max number of elements in the map less than 10. The map is accessed very often and speed is the most important factor. Changes to the map are infrequent.

While measuring the speed is obviously the correct approach here, this code will be used on several platforms so I'm trying to create a general rule of thumb for choosing between a map and unordered_map based on number of elements. I've seen some posts here that hint that std::map may be faster for a small number elements, but no definition of "small" was given.

Is there a rule of thumb for when to choose between a map and unordered_map based on number of elements? Is another data structure (such as linear search through a vector) even better?

Is the performance of this data structure so crucial that you need to even care about which of these you use? — Lily Ballard, Mar 25 '13 at 21:38
maybe not. Maybe thats why I haven't found an answer through searching, because it a negligible difference. — default, Mar 25 '13 at 21:39
This might depend on the runtime performance of the hash function vs. that of the compare function. — Philipp, Mar 25 '13 at 21:40
@KevinBallard It can be important when you access a map in an inner loop. — Philipp, Mar 25 '13 at 21:40
The question is probably driven mostly out of curiosity rather than a real need at this point. — default, Mar 25 '13 at 21:40
I feel like vector might be faster for < 10 elements and infrequent changes because of data locality. — iefserge, Mar 25 '13 at 21:46
[Boost::flatmap](http://www.boost.org/doc/libs/1_53_0/doc/html/container/non_standard_containers.html#container.non_standard_containers.flat_xxx) — Mooing Duck, Mar 25 '13 at 23:38
FWIW, I ended up changing to from `map` to `unordered_map` which might have given a performance benefit, but nothing crazy. Then a couple years later we need to improve performance even more so I switched to a linear search through a `vector` and performance improved quite dramatically (we have a bunch of these maps in our system and they are accessed a lot). — default, Nov 13 '15 at 15:23

score 23 · Accepted Answer · edited May 23 '17 at 12:10

23

Under the premise that you always need to measure in order to figure out what's more appropriate in terms of performance, if all these things are true:

Changes to the map are not frequent;
The map contains a maximum of 10 elements;
Lookups will be frequent;
You care a lot about performance;

Then I would say you would be better off putting your elements in an std::vector and performing a plain iteration over all your elements to find the one you're looking for.

An std::vector will allocate its elements in a contiguous region of memory, so cache locality is likely to grant you a greater performance - the time required to fetch a cache line from main memory after a cache miss is at least one order of magnitude higher than the time required to access the CPU cache.

Quite interestingly, it seems like Boost's flat_map is ideal for your use case (courtesy of Praetorian):

flat_map is similar to std::map but it's implemented like an ordered vector. (from the online documentation)

So if using Boost is an option for you, you may want to try this one.

edited May 23 '17 at 12:10

Community

1
1

answered Mar 25 '13 at 21:46

Andy Prowl

124,023
23
387
451

4

You'll probably also want to sort the vector (assuming infrequent updates) to give better branch prediction performance. – sfstewman Mar 25 '13 at 21:53
4

If the OP can use Boost, [flat_map](http://www.boost.org/doc/html/boost/container/flat_map.html) might be ideal for this use case. – Praetorian Mar 25 '13 at 22:05
@sfstewman: I can't figure out why would sorting the vector give better *branch prediction* performance. Could anyone help me understand? – yzt Mar 26 '13 at 00:05
@YaserZhian: See **[this Q&A](http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array)** I also linked in the related part of my answer, it's very informative! – Andy Prowl Mar 26 '13 at 00:06
@AndyProwl: Well actually, the way I see it, this case is quite different from typical data-dependent-branch-inside-a-loop situations. Here, we have a search which is a loop with an *equality* test inside it, which will *break* out of the loop if it evaluates to `true` (and it will evaluate to `true` exactly 0 or 1 time.) So, could you explain how that branch is going to be predicted? Because, the way I understand it, CPUs predict branch *patterns*, not a branch that is taken only once in a loop. – yzt Mar 26 '13 at 00:17
@YaserZhian: In fact I hadn't considered that all the lookups here are equality lookups. I guess I can remove that comment then. Thank you for pointing out – Andy Prowl Mar 26 '13 at 00:19
If you really hate branches, you can run `n lg n` comparisons, generate the bits of the index arithmetically, and then do an array lookup to extract the answer. In a 4 element container, `bool bits[] = {x[3] == y | x[1] == y, x[3] == y | x[2] == y}; int index=bits[0] + 2*bits[1];` -- which assumes that the data is in the pseudo-map, but lets you find it without a single branch. ;) – Yakk - Adam Nevraumont Mar 26 '13 at 02:15
1

@YaserZhian If the array is sorted (assuming an ordering exists), the comparisons do not have to be straight equality. For example, the loop could exit with the branch condition `*iter >= item`, with a subsequent equality test outside of the loop. In fact, using this with a sentinel item (strictly > all possible items) at the end of the list will combine the equality test and the test to exit the loop (`iter != list.end()`) into a single inequality. – sfstewman Mar 26 '13 at 03:00
1

@sfstewman: Thanks for the explanation. Still it doesn't change the fact that the branch will be taken exactly (or at most) once, which is useless (and/or impossible) to predict. I guess what we really want to achieve is to somehow hint to the CPU that the branch out is *not* taken. Combining the conditions into a single one using a sentinel is helpful too, as it removes a second branch (that is likely to fall in the shadow of the first.) However, don't most CPUs predict cold branches (ones they haven't seen yet) to *not be taken* when they are jumping forward? Thanks again! – yzt Mar 27 '13 at 05:52
@YaserZhian The branch is taken once, to exit the loop. However, modern CPUs have to predict whether the branch on each iteration through the loop, and make that prediction based on the previous iteration through the loop. This is why sorting the loop helps with branch prediction; the predicted behavior (don't exit the loop) holds until the element is found. – sfstewman Mar 27 '13 at 06:16
1

@sfstewman: But it would have held regardless of whether we sort the data or not, wouldn't it? I mean we *are* searching for one specific element and all other comparisons won't lead to the branch being taken, sorted data or not. Right? – yzt Mar 27 '13 at 09:43

score 4 · Answer 2 · answered Mar 25 '13 at 21:47

I believe for your case of 10 elements or less and usually only one a linear search of an unsorted vector will work best. However, depending on the hash algorithm used the unordered_map may be faster instead.

It should be easy enough for you to benchmark.

map vs unordered_map for few elements

2 Answers2

Linked