31

Sure, the lookup performance of an unordered_map is constant on average, and the lookup performance of a map is O(logN).

But of course in order to find an object in an unordered_map, we have to:

  1. hash the key we want to find.
  2. equality_compare the key with every key in the same bucket.

Whereas in a map, we simply need to less_than compare the sought key with log2(N) keys, where N is the number of items in the map.

I wondered what the real performance difference would be, given that the hash function adds overhead and an equality_compare is no cheaper than a less_than compare.

Rather than bother the community with a question I could answer myself, I wrote a test.

I have shared the results below, in case anyone else finds this interesting or useful.

More answers are of course invited if someone is able and willing to add more information.

Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • 6
    The problem with `map` is not the `log N` per se; it's that each memory access as you walk the tree is essentially random. This is not important when the map is small, but it dominates when the map is large. (The difference between accessing cache and memory is an order of magnitude or two; see e.g. http://stackoverflow.com/q/4087280/. And this difference tends to increase across CPU generations because the relevant physics is local.) The equal-to/less-than operations are unnoticeable compared to the pointer chasing. – Nemo Apr 04 '16 at 03:01
  • @Nemo Have a look at my test results, in particular the flat_map vs map. It seems on first glance that the pointer chasing is accounting for a doubling in lookup time in a map when compared to a (large!) sorted vector. However, there may be other factors at play here. clang seems more willing to inline the entire search for `lower_bound` on a vector, than `at` for a map, for example. – Richard Hodges Apr 04 '16 at 07:07

2 Answers2

34

In response to questions about performance in relation to the number of missed searches, I have refactored the test to parameterise this.

Example results:

searches set_size miss ordered unordered flat_map
1000000 0 100% 4384 12901 681
1000000 99 99.99% 89127 42615 86091
1000000 172 99.98% 101283 53468 96008
1000000 303 99.97% 112747 53211 107343
1000000 396 99.96% 124179 59655 112687
1000000 523 99.95% 132180 51133 121669
1000000 599 99.94% 135850 55078 121072
1000000 695 99.93% 140204 60087 124961
1000000 795 99.92% 146071 64790 127873
1000000 916 99.91% 154461 50944 133194
1000000 988 99.9% 156327 54094 134288

Key:

searches = number of searches performed against each map
set_size = how big each map is (and therefore how many of the searches will result in a hit)
miss = the probability of generating a missed search. Used for generating searches and set_size.
ordered = the time spent searching the ordered map
unordered = the time spent searching the unordered_map
flat_map = the time spent searching the flat map

note: time is measured in std::system_clock::duration ticks.

TL;DR

Results: the unordered_map shows its superiority as soon as there is data in the map. The only time it exhibits worse performance than the ordered map is when the maps are empty.

Here's the new code:

#include <iostream>
#include <iomanip>
#include <random>
#include <algorithm>
#include <string>
#include <vector>
#include <map>
#include <unordered_map>
#include <unordered_set>
#include <chrono>
#include <tuple>
#include <future>
#include <stdexcept>
#include <sstream>

using namespace std;

// this sets the length of the string we will be using as a key.
// modify this to test whether key complexity changes the performance ratios
// of the various maps
static const size_t key_length = 20;

// the number of keys we will generate (the size of the test)
const size_t nkeys = 1000000;



// use a virtual method to prevent the optimiser from detecting that
// our sink function actually does nothing. otherwise it might skew the test
struct string_user
{
    virtual void sink(const std::string&) = 0;
    virtual ~string_user() = default;
};

struct real_string_user : string_user
{
    virtual void sink(const std::string&) override
    {
        
    }
};

struct real_string_user_print : string_user
{
    virtual void sink(const std::string& s) override
    {
        cout << s << endl;
    }
};

// generate a sink from a string - this is a runtime operation and therefore
// prevents the optimiser from realising that the sink does nothing
std::unique_ptr<string_user> make_sink(const std::string& name)
{
    if (name == "print")
    {
        return make_unique<real_string_user_print>();
    }
    if (name == "noprint")
    {
        return make_unique<real_string_user>();
    }
    throw logic_error(name);
}

// generate a random key, given a random engine and a distribution
auto gen_string = [](auto& engine, auto& dist)
{
    std::string result(key_length, ' ');
    generate(begin(result), end(result), [&] {
        return dist(engine);
    });
    return result;
};

// comparison predicate for our flat map.
struct pair_less
{
    bool operator()(const pair<string, string>& l, const string& r) const {
        return l.first < r;
    }

    bool operator()(const string& l, const pair<string, string>& r) const {
        return l < r.first;
    }
};

template<class F>
auto time_test(F&& f, const vector<string> keys)
{
    auto start_time = chrono::system_clock::now();
    
    for (auto const& key : keys)
    {
        f(key);
    }
    
    auto stop_time = chrono::system_clock::now();
    auto diff =  stop_time - start_time;
    return diff;
}

struct report_key
{
    size_t nkeys;
    int miss_chance;
};

std::ostream& operator<<(std::ostream& os, const report_key& key)
{
    return os << "miss=" << setw(2) << key.miss_chance << "%";
}

void run_test(string_user& sink, size_t nkeys, double miss_prob)
{
    // the types of map we will test
    unordered_map<string, string> unordered;
    map<string, string> ordered;
    vector<pair<string, string>> flat_map;
    
    // a vector of all keys, which we can shuffle in order to randomise
    // access order of all our maps consistently
    vector<string> keys;
    unordered_set<string> keys_record;

    // generate keys
    auto eng = std::default_random_engine(std::random_device()());
    auto alpha_dist = std::uniform_int_distribution<char>('A', 'Z');
    auto prob_dist = std::uniform_real_distribution<double>(0, 1.0 - std::numeric_limits<double>::epsilon());
    
    auto generate_new_key = [&] {
        while(true) {
            // generate a key
            auto key = gen_string(eng, alpha_dist);
            // try to store it in the unordered map
            // if it already exists, force a regeneration
            // otherwise also store it in the ordered map and the flat map
            if(keys_record.insert(key).second) {
                return key;
            }
        }
    };
    
    for (size_t i = 0 ; i < nkeys ; ++i)
    {
        bool inserted = false;
        auto value = to_string(i);

        auto key = generate_new_key();
        if (prob_dist(eng) >= miss_prob) {
            unordered.emplace(key, value);
            flat_map.emplace_back(key, value);
            ordered.emplace(key, std::move(value));
        }
        // record the key for later use
        keys.emplace_back(std::move(key));
    }
    // turn our vector 'flat map' into an actual flat map by sorting it by pair.first. This is the key.
    sort(begin(flat_map), end(flat_map),
         [](const auto& l, const auto& r) { return l.first < r.first; });
    
    // shuffle the keys to randomise access order
    shuffle(begin(keys), end(keys), eng);
    
    auto unordered_lookup = [&](auto& key) {
        auto i = unordered.find(key);
        if (i != end(unordered)) {
            sink.sink(i->second);
        }
    };
    
    auto ordered_lookup = [&](auto& key) {
        auto i = ordered.find(key);
        if (i != end(ordered)) {
            sink.sink(i->second);
        }
    };
    
    auto flat_map_lookup = [&](auto& key) {
        auto i = lower_bound(begin(flat_map),
                             end(flat_map),
                             key,
                             pair_less());
        if (i != end(flat_map) && i->first == key) {
            sink.sink(i->second);
        }
    };
    
    // spawn a thread to time access to the unordered map
    auto unordered_future = async(launch::async,
                                  [&]()
                                  {
                                      return time_test(unordered_lookup, keys);
                                  });
    
    // spawn a thread to time access to the ordered map
    auto ordered_future = async(launch::async, [&]
                                {
                                    return time_test(ordered_lookup, keys);
                                });
    
    // spawn a thread to time access to the flat map
    auto flat_future = async(launch::async, [&]
                             {
                                 return time_test(flat_map_lookup, keys);
                             });
    
    // synchronise all the threads and get the timings
    auto ordered_time = ordered_future.get();
    auto unordered_time = unordered_future.get();
    auto flat_time = flat_future.get();
    
    cout << "searches=" << setw(7) << nkeys;
    cout << " set_size=" << setw(7) << unordered.size();
    cout << " miss=" << setw(7) << setprecision(6) << miss_prob * 100.0 << "%";
    cout << " ordered=" << setw(7) << ordered_time.count();
    cout << " unordered=" << setw(7) << unordered_time.count();
    cout << " flat_map=" << setw(7) << flat_time.count() << endl;

}

int main()
{
    // generate the sink, preventing the optimiser from realising what it
    // does.
    stringstream ss;
    ss << "noprint";
    string arg;
    ss >> arg;
    auto puser = make_sink(arg);
    
    for (double chance = 1.0 ; chance >= 0.0 ; chance -= 0.0001)
    {
        run_test(*puser, 1000000, chance);
    }
    
    
    return 0;
}
Emile Cormier
  • 28,391
  • 15
  • 94
  • 122
Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • 2
    Your benchmark program has been super helpful to me. I did adapt it to check performance diff between map and unordered map with int keys. Here is the time for making 1,000,000 lookup on 100 elements containers with no misses. I do that with a for loop where i%100 is looked up. Here is the result that I get: ordered=259130usec unordered=125470usec. iow, a 100 ints unordered_map is roughly 2x faster than map! This has been tested with gcc 11.2 compiled in c++20 mode – lano1106 Apr 09 '22 at 15:17
9

In this following test, which I compiled on apple clang with -O3, I have taken steps to ensure that the test is fair, such as:

  1. call a sink function with the result of each search through a vtable, to prevent the optimiser inlining away entire searches!

  2. run tests on 3 different kinds of maps, containing the same data, in the same order in parallel. This means that if one test starts to 'get ahead', it starts entering cache-miss territory for the search set (see code). This means that no one test gets an unfair advantage of encountering a 'hot' cache.

  3. parameterise the key size (and therefore complexity)

  4. parameterised the map size

  5. tested three different kinds of maps (containing the same data) - an unordered_map, a map and a sorted vector of key/value pairs.

  6. checked the assembler output to ensure that the optimiser has not been able to optimise away entire chunks of logic due to dead code analysis.

Here is the code:

#include <iostream>
#include <random>
#include <algorithm>
#include <string>
#include <vector>
#include <map>
#include <unordered_map>
#include <chrono>
#include <tuple>
#include <future>
#include <stdexcept>
#include <sstream>

using namespace std;

// this sets the length of the string we will be using as a key.
// modify this to test whether key complexity changes the performance ratios
// of the various maps
static const size_t key_length = 20;

// the number of keys we will generate (the size of the test)
const size_t nkeys = 1000000;


// the types of map we will test
unordered_map<string, string> unordered;
map<string, string> ordered;
vector<pair<string, string>> flat_map;

// a vector of all keys, which we can shuffle in order to randomise
// access order of all our maps consistently
vector<string> keys;

// use a virtual method to prevent the optimiser from detecting that
// our sink function actually does nothing. otherwise it might skew the test
struct string_user
{
    virtual void sink(const std::string&) = 0;
    virtual ~string_user() = default;
};

struct real_string_user : string_user
{
    virtual void sink(const std::string&) override
    {
        
    }
};

struct real_string_user_print : string_user
{
    virtual void sink(const std::string& s) override
    {
        cout << s << endl;
    }
};

// generate a sink from a string - this is a runtime operation and therefore
// prevents the optimiser from realising that the sink does nothing
std::unique_ptr<string_user> make_sink(const std::string& name)
{
    if (name == "print")
    {
        return make_unique<real_string_user_print>();
    }
    if (name == "noprint")
    {
        return make_unique<real_string_user>();
    }
    throw logic_error(name);
}

// generate a random key, given a random engine and a distribution
auto gen_string = [](auto& engine, auto& dist)
{
    std::string result(key_length, ' ');
    generate(begin(result), end(result), [&] {
        return dist(engine);
    });
    return result;
};

// comparison predicate for our flat map.
struct pair_less
{
    bool operator()(const pair<string, string>& l, const string& r) const {
        return l.first < r;
    }

    bool operator()(const string& l, const pair<string, string>& r) const {
        return l < r.first;
    }
};

int main()
{
    // generate the sink, preventing the optimiser from realising what it
    // does.
    stringstream ss;
    ss << "noprint";
    string arg;
    ss >> arg;
    auto puser = make_sink(arg);
    
    // generate keys
    auto eng = std::default_random_engine(std::random_device()());
    auto alpha_dist = std::uniform_int_distribution<char>('A', 'Z');
    
    for (size_t i = 0 ; i < nkeys ; ++i)
    {
        bool inserted = false;
        auto value = to_string(i);
        while(!inserted) {
            // generate a key
            auto key = gen_string(eng, alpha_dist);
            // try to store it in the unordered map
            // if it already exists, force a regeneration
            // otherwise also store it in the ordered map and the flat map
            tie(ignore, inserted) = unordered.emplace(key, value);
            if (inserted) {
                flat_map.emplace_back(key, value);
                ordered.emplace(key, std::move(value));
                // record the key for later use
                keys.emplace_back(std::move(key));
            }
        }
    }
    // turn our vector 'flat map' into an actual flat map by sorting it by pair.first. This is the key.
    sort(begin(flat_map), end(flat_map),
         [](const auto& l, const auto& r) { return l.first < r.first; });
    
    // shuffle the keys to randomise access order
    shuffle(begin(keys), end(keys), eng);

    // spawn a thread to time access to the unordered map
    auto unordered_future = async(launch::async, [&]()
                                  {
                                      auto start_time = chrono::system_clock::now();

                                      for (auto const& key : keys)
                                      {
                                          puser->sink(unordered.at(key));
                                      }
                                      
                                      auto stop_time = chrono::system_clock::now();
                                      auto diff =  stop_time - start_time;
                                      return diff;
                                  });
    
    // spawn a thread to time access to the ordered map
    auto ordered_future = async(launch::async, [&]
                                {
                                    
                                    auto start_time = chrono::system_clock::now();
                                    
                                    for (auto const& key : keys)
                                    {
                                        puser->sink(ordered.at(key));
                                    }
                                    
                                    auto stop_time = chrono::system_clock::now();
                                    auto diff =  stop_time - start_time;
                                    return diff;
                                });

    // spawn a thread to time access to the flat map
    auto flat_future = async(launch::async, [&]
                                {
                                    
                                    auto start_time = chrono::system_clock::now();
                                    
                                    for (auto const& key : keys)
                                    {
                                        auto i = lower_bound(begin(flat_map),
                                                               end(flat_map),
                                                               key,
                                                               pair_less());
                                        if (i != end(flat_map) && i->first == key)
                                            puser->sink(i->second);
                                        else
                                            throw invalid_argument(key);
                                    }
                                    
                                    auto stop_time = chrono::system_clock::now();
                                    auto diff =  stop_time - start_time;
                                    return diff;
                                });

    // synchronise all the threads and get the timings
    auto ordered_time = ordered_future.get();
    auto unordered_time = unordered_future.get();
    auto flat_time = flat_future.get();
 
    // print
    cout << "  ordered time: " << ordered_time.count() << endl;
    cout << "unordered time: " << unordered_time.count() << endl;
    cout << " flat map time: " << flat_time.count() << endl;
    
    return 0;
}

Results:

  ordered time: 972711
unordered time: 335821
 flat map time: 559768

As you can see, the unordered_map convincingly beats the map and the sorted pair vector. The vector of pairs has twice as fast as the map solution. This is interesting as lower_bound and map::at have almost equivalent complexity.

TL;DR

in this test, the unordered map is approximately 3 times as fast (for lookups) as an ordered map, and a sorted vector convincingly beats a map.

I was actually shocked at how much faster it is.

Community
  • 1
  • 1
Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • Since you've taken the effort to parameterise the size, it might be a good idea to put your results in a table with columns for various sizes. This should reveal a rough threshold of the point at which the higher overheads of unordered_map are superseded by the lower Order of Complexity. – Disillusioned Apr 03 '16 at 23:55
  • @CraigYoung the thought crossed my mind. But when I changed the parameters manually, it did not make much difference to the ratios. I found this surprising too. – Richard Hodges Apr 03 '16 at 23:57
  • Even for a size like 10? – Disillusioned Apr 04 '16 at 00:23
  • @CraigYoung map size or key size? – Richard Hodges Apr 04 '16 at 00:25
  • @CraigYoung reducing the key size to 10 produced these results: ordered time: 938733 unordered time: 335819 flat map time: 938673 The ratio is the same. – Richard Hodges Apr 04 '16 at 00:27
  • 1
    I'm referring to changing the value of `nkeys`. – Disillusioned Apr 04 '16 at 01:01
  • @CraigYoung with maps that small the test is so quick that it's immeasurable. – Richard Hodges Apr 04 '16 at 01:27
  • 4
    It looks like your "flat map" test actually searches both the sorted vector and the ordered map. So I'm a little surprised that it has the same timing. Actually - that might have to do with running the tests concurrently. I'd personally feel better if the tests were not run concurrently to eliminate contention as a factor, Also, the flat map test shouldn't do anything with the `ordered` object (unless I'm misunderstanding something). – Michael Burr Apr 04 '16 at 01:34
  • @MichaelBurr ah yes, that's a bug. I expect the binary search is being completely eliminated. I'll fix that in the morning. There should be no contention as there are no mutexes (everything is read only during the tests). However it would be trivial to make the tests sequential. I did this in the first version. There was no performance difference. – Richard Hodges Apr 04 '16 at 01:37
  • @MichaelBurr fixed. interesting new results. the flat_map is twice as fast as the map. – Richard Hodges Apr 04 '16 at 01:49
  • 1
    @RichardHodges: You have no explicit mutexes in code, but there's' still only a limited number of processors, which _are_ in contention. – Mooing Duck Apr 04 '16 at 01:55
  • 1
    @MooingDuck this laptop has 4 cores and there are 3 threads running in the test at any one time. Any contention will be memory cache only. But I accept that on another machine it might make a difference. However, as mentioned, earlier versions ran the tests sequentially. The results were the same. – Richard Hodges Apr 04 '16 at 02:00
  • 2
    @Richard re: "test is so quick that it's immeasurable" The whole point of examining orders of complexity is to understand the performance implications for different values of N. In particular, the benefit of lower orders of complexity is that no matter how large the constant overheads may be, there will be a threshold value N from which the lower order of complexity starts to perform better. _In order to test with lower values of N, you need to repeat the test a number of times in order to get measurable results_. – Disillusioned Apr 04 '16 at 02:05
  • 1
    PS: For the record, you're testing a fairly specific and probably abnormal usage pattern: Build up a map and lookup each entry _exactly once_ with _zero_ missed searches. Furthermore, your timing excludes the time required to build the maps. – Disillusioned Apr 04 '16 at 02:10
  • @RichardHodges: I tweaked the code to do ~100000 lookups no matter the number of keys to measure small sets, and did each test 6 times in rotating order to reduce ordering effects, and use the minimum of the 6 runs to remove interferance from other processes. I put the tests themselves in a template to share code, and removed the virtual overhead. http://coliru.stacked-crooked.com/a/5d7eba2301100274 I didn't have a convenient way to run them locally so I ran on Coliru, but got weird results :( http://coliru.stacked-crooked.com/a/46378fe1c78f1ee9 – Mooing Duck Apr 04 '16 at 02:43
  • 1
    @MooingDuck Your coliru test was not compiled with optimisations on. I copied that code and ran it locally (optimised). The test completed instantly. Remember that very carefully crafted code I put in to defeat the optimiser...? :-) – Richard Hodges Apr 04 '16 at 07:01
  • @MooingDuck posted new answer to address concerns – Richard Hodges Apr 04 '16 at 09:04
  • @CraigYoung posted new answer to address concerns – Richard Hodges Apr 04 '16 at 09:05
  • 2
    I just ran the benchmark on GCC and MSVC. With about 20 items in the map, unordered goes 2x as fast with GCC but already near 3x with MSVC. MSVC did complain about `std::uniform_int_distribution` but it works with ``. – Octo Poulos Dec 31 '22 at 15:02