15

I’ve recently found out an odd thing. It seems that calculating Collatz sequence lengths with no caching at all is over 2 times faster than using std::unordered_map to cache all elements.

Note I did take hints from question Is gcc std::unordered_map implementation slow? If so - why? and I tried to used that knowledge to make std::unordered_map perform as well as I could (I used g++ 4.6, it did perform better than recent versions of g++, and I tried to specify a sound initial bucket count, I made it exactly equal to the maximum number of elements the map must hold).

In comparision, using std::vector to cache a few elements was almost 17 times faster than no caching at all and almost 40 times faster than using std::unordered_map.

Am I doing something wrong or is this container THAT slow and why? Can it be made performing faster? Or maybe hashmaps are inherently ineffective and should be avoided whenever possible in high-performance code?

The problematic benchmark is:

#include <iostream>
#include <unordered_map>
#include <cstdint>
#include <ctime>

std::uint_fast16_t getCollatzLength(std::uint_fast64_t val) {
    static std::unordered_map <std::uint_fast64_t, std::uint_fast16_t> cache ({{1,1}}, 2168611);

    if(cache.count(val) == 0) {
        if(val%2 == 0)
            cache[val] = getCollatzLength(val/2) + 1;
        else
            cache[val] = getCollatzLength(3*val+1) + 1;
    }

    return cache[val];
}

int main()
{
    std::clock_t tStart = std::clock();

    std::uint_fast16_t largest = 0;
    for(int i = 1; i <= 999999; ++i) {
        auto cmax = getCollatzLength(i);
        if(cmax > largest)
            largest = cmax;
    }
    std::cout << largest << '\n';

    std::cout << "Time taken: " << (double)(std::clock() - tStart)/CLOCKS_PER_SEC << '\n';
}

It outputs: Time taken: 0.761717

Whereas a benchmark with no caching at all:

#include <iostream>
#include <unordered_map>
#include <cstdint>
#include <ctime>

std::uint_fast16_t getCollatzLength(std::uint_fast64_t val) {
    std::uint_fast16_t length = 1;
    while(val != 1) {
        if(val%2 == 0)
            val /= 2;
        else
            val = 3*val + 1;
        ++length;
    }
    return length;
}

int main()
{
    std::clock_t tStart = std::clock();

    std::uint_fast16_t largest = 0;
    for(int i = 1; i <= 999999; ++i) {
        auto cmax = getCollatzLength(i);
        if(cmax > largest)
            largest = cmax;
    }
    std::cout << largest << '\n';

    std::cout << "Time taken: " << (double)(std::clock() - tStart)/CLOCKS_PER_SEC << '\n';
}

Outputs Time taken: 0.324586

Community
  • 1
  • 1
  • @WhiZTiM Am I doing something wrong or is this container THAT slow and why? –  Mar 03 '17 at 20:57
  • 2
    `unordered_map` just *is* pretty slow for a lot of uses. Most of the time `std::vector` is the fastest container. Data locality > everything else. – Baum mit Augen Mar 03 '17 at 21:00
  • @WhiZTiM Edited the question, better now? –  Mar 03 '17 at 21:00
  • 1
    If you're having complexity issues, I feel bad for you son, std::unordered_map returns in Big-O(N). [(worst case)](http://stackoverflow.com/questions/3902644/choosing-between-stdmap-and-stdunordered-map) – JGroven Mar 03 '17 at 21:02
  • Have you checked the actual values? The use of a uint_fast16_t cache value IMO is suspicious given that half of 999999 is greater than 65536. – Art Yerkes Mar 03 '17 at 21:02
  • Also you're doing two lookups; one in .count, one for operator []. You can get an iterator from find that will serve both purposes. – Art Yerkes Mar 03 '17 at 21:04
  • @ArtYerkes Max length is 525, can’t exceed 16bit int. These are lengths of Collatz sequences, a sequence starting at `999999` won’t have a length of `999999`. –  Mar 03 '17 at 21:05
  • @ArtYerkes `Also you're doing two lookups; one in .count, one for operator [].` Ikr but shouldn’t -O3 deal with this? Lemme check. –  Mar 03 '17 at 21:06
  • @ArtYerkes `You can get an iterator from find that will serve both purposes.` Um no `find()` will give me `end()` if there is no such element, I can’t write to it :( –  Mar 03 '17 at 21:11
  • The insert case should be relatively rare. You can use operator [] for that. std::uint_fast16_t getCollatzLength(std::uint_fast64_t val) { static std::unordered_map cache ({{1,1}}, 2168611); auto it = cache.find(val); if(it == cache.end()) { if(val%2 == 0) return cache[val] = getCollatzLength(val/2) + 1; else return cache[val] = getCollatzLength(3*val+1) + 1; } return it->second; } – Art Yerkes Mar 03 '17 at 21:17
  • @ArtYerkes You mean something like that: http://melpon.org/wandbox/permlink/H2gKPi06XbCqFbIJ ? Still no performance gain :( –  Mar 03 '17 at 21:25
  • I see a small perf gain: 0.883951 old vs 0.805454 new. Still not as much as I would have guessed. – Art Yerkes Mar 03 '17 at 21:28

1 Answers1

32

The standard library's maps are, indeed, inherently slow (std::map especially but std::unoredered_map as well). Google's Chandler Carruth explains this in his CppCon 2014 talk; in a nutshell: std::unordered_map is cache-unfriendly because it uses linked lists as buckets.

This SO question mentioned some efficient hash map implementations - use one of those instead.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • 1
    The linked talk is really an excellent video to watch. That guy has a much more in-depth understanding of performance issues than the vast majority of people who think they know about performance issues. – cmaster - reinstate monica Oct 02 '20 at 08:40