1

I've done some random tests but I couldn't come to a conclusion.

If inserting 1000000 integers in to a map and an unordered_map, the time used by map is 3 times larger.

If inserting 1000000 strings then the time used by map is 2 times larger.

Under what circumstances will std::unordered_map behave very slow?

Thanks in advance.

UPD:: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3). All tests got without -O2.

Code:

a.cpp: std::map<int, int> M; b.cpp: std::unordered_map<int, int> M;

g(i, 1, 1000000) {
    M[i] = rand() % i;
}

Results of my tests:

yyhs@yyhs-Pro:~/Documents$ g++ a.cpp -o a -g --std=c++11 && time ./a

real    0m0.659s
user    0m0.653s
sys 0m0.004s
yyhs@yyhs-Pro:~/Documents$ g++ b.cpp -o b -g --std=c++11 && time ./b

real    0m0.260s
user    0m0.251s
sys 0m0.008s

yyhs@yyhs-Pro:~/Documents$ g++ a.cpp -o a -g --std=c++11 -O2 && time ./a

real    0m0.290s
user    0m0.282s
sys 0m0.008s
yyhs@yyhs-Pro:~/Documents$ g++ b.cpp -o b -g --std=c++11 -O2 && time ./b

real    0m0.081s
user    0m0.081s
sys 0m0.000s

My question here is what cases might cause std::unordered_map become slow.

SCaffrey
  • 331
  • 1
  • 3
  • 14
  • 2
    You did not post what compiler you're using and what compiler options you used to build the test. If you're timing a "debug" or unoptimized build, the results are meaningless. – PaulMcKenzie Jun 23 '16 at 02:34
  • Thanks a lot @PaulMcKenzie. I've added the details. – SCaffrey Jun 23 '16 at 02:37
  • When you say "got without -O2", are you saying you're timing an unoptimized build? – PaulMcKenzie Jun 23 '16 at 02:38
  • `g++ a.cpp -o a -g --std=c++11` without `-O2`. (Instead of `g++ a.cpp -o a -g -O2 --std=c++11`) – SCaffrey Jun 23 '16 at 02:39
  • 1
    Then read my first comment. The results are meaningless. Time optimized builds. – PaulMcKenzie Jun 23 '16 at 02:39
  • 3
    A hash map is slow when you get lots of collisions, and/or evaluation of the hash function is sloooow. You can get the latter with long strings. The hash needs to consider the whole string, while checking the string against nodes in a tree may get away with considering only the first small part. – Cheers and hth. - Alf Jun 23 '16 at 02:39
  • Map works things out a bit faster with `-O2` but still almost the same. – SCaffrey Jun 23 '16 at 02:42
  • @SCaffrey The point is whether it is the same results or not, you should provide results from optimized builds if you're going to post timing information (or relative timing information). This will ensure that the people who are willing to answer are not wasting their time trying to solve a problem when there is no problem. – PaulMcKenzie Jun 23 '16 at 02:49
  • Possible duplicate of [Choosing between std::map and std::unordered\_map](http://stackoverflow.com/questions/3902644/choosing-between-stdmap-and-stdunordered-map) – QuestionC Jun 23 '16 at 02:59

1 Answers1

3

As usual, this will depend on the particular implementation, but this is not entirely true and the standard guarantees that std::unordered_map will outperform std::map asymptotically. It is only the constant factors that will vary from implementation to implementation. The insertion time for std::map is O(log N) and the average insertion time for std::unordered_map is O(1). See §23.4.4.1 and §23.5.4 in n3690 for details.

In general, std::unordered_map will outperform std::map by a large margin (as you observed) unless you have a lot of collisions. You can create collisions by choosing keys that get placed in the same bucket. This requires knowledge of your hash function and the mapping from hash values to buckets, but this knowledge can be used by attackers to make your program slow if they can control the keys in your hash table. For this reason, it is common to use randomized hash functions in exposed applications.

In pathological cases, std::map can outperform std::unordered_map if your hash function is chosen poorly (either very slow to evaluate or producing many collisions). This is extremely atypical.

As a minor note, the standard library std::unordered_map tends to be an open hash table in order to meet the C++ standard's requirements with respect to iterator behavior. This is known to be far from optimal for many applications, and there are a number of alternative hash table libraries out there which perform even better.

Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415