7

I have a need to map strings of constant size, which contain just alphanumeric values (A-Z, 0-9, no lower case letters) to other strings. The unordered_map becomes very large (tens of millions of keys), while the mapped values are from a set of a few thousand strings. Upon profiling I found that most time is spent inserting new values into the map (operator[]), and also clearing the map takes a very long time.

std::unordered_map<std::string, std::string> hashMap;
while (...){
    ...
    hashMap[key] = value;  // ~50% of program time is spent here
    ...
}

hashMap.clear();  // Takes a very long time, at this point hashMap.size() > 20,000,000

My thoughts are that the string allocations/deallocations are very slow, as well as hashing and inserting into the map. Any suggestions to optimize this? Keep in mind that the key size is constant, and its contents are limited to a set of 36 characters, and that the mapped values are from a limited set. I'm open to using different container / data types other than strings and unordered_map.

Update

Following a suggestions by Baum Mit Augen I changed my key type to unsigned long long and made a function to convert base 36 to decimal:

unsigned long long ConvertBase36(const char* num)
{
    unsigned long long retVal = 0;

    for (int i = 0; i < 12; i++)
    {
        unsigned int digit = 0;
        char currChar = num[i];
        if (currChar <= '9')
        {
            digit = currChar - '0';
        }
        else
        {
            digit = currChar - 'A' + 10;
        }

        retVal *= 36;
        retVal += digit;
    }

    return retVal;
}

This gave me about 10% improvement in whole program runtime. I then tried to use the unordered_map reserve function again to see if it made any difference and it did not. Trying map instead of unordered_map did about 10% worse so I reverted that change. Finally replacing the string value with an unsigned int made things a bit faster.

Eyal K.
  • 1,042
  • 8
  • 23
  • Just an idea, shooting from the hip: Use a std::array (fixed length) of char instead if a std::string as keys. Are the mapped values (also strings) of varying length? Maybe it is worth trying a std::map instead of an unordered map, just to see how it performs. Also, since the mapped values are much fewer, maybe it is worth storing smart pointers to them instead of full values (another commenter already suggested something along these lines). – Erik Alapää Aug 22 '16 at 09:49
  • Could you create a pool of your *values* (vector/deque?) and put pointers (or indexes) to those in your map? – Galik Aug 22 '16 at 09:50
  • 2
    How long are the keys? You could convert (substrings) to integer with strtol in base 36. What platform? ***Have you turned on optimization?*** – Martin Bonner supports Monica Aug 22 '16 at 09:50
  • You could also try increasing the bucket count, either on initialization or with `rehash(N)` to some much bigger number. Compare with your current (implementation-defined) `.bucket_count()` and play with it to see what kind of results you're getting. – Yam Marcovic Aug 22 '16 at 09:53
  • you also could create your own allocator for the string. with a clever preallocation you can heavily reduce the `new`'s/`delete`'s. – user1810087 Aug 22 '16 at 09:55
  • You'll want to write a wrapper class for your strings, with size of just a pointer (to the actual string). Then you'll also need a reverse mapping, `unordered_map` to actually keep your value strings and avoid duplicates. And then you'll want a wrapper class (perhaps just `std::array` of `char`) for your keys, too. Finally, if you know the size of the map beforehand, resize it beforehand, that should drop your 50% to maybe 30% right away. – hyde Aug 22 '16 at 09:55
  • @ErikAlapää std::map performs just as bad, with any gains in insertion being lost to slower finds. Clearing it is also not faster. I will try using std::array, but it will require some changes to my code, so I will get back to you on that – Eyal K. Aug 22 '16 at 09:55
  • @MartinBonner Full optimization is on, the platform is Windows 10 x64, Visual Studio 2013. The keys are 12 bytes long – Eyal K. Aug 22 '16 at 09:57
  • 1
    @EyalK. for 20 million keys, `std::map` should be a lot slower than `std::unordered_map`, not "just as bad". It is O(log N) vs amortized O(1) for insertion of single new item. – hyde Aug 22 '16 at 09:59
  • @hyde I meant that it didn't improve things. My profiling doesn't time the functions, only shows where the program spends the most time. There is a lot of file I/O as well so things take a long time. – Eyal K. Aug 22 '16 at 10:03
  • @EyalK.OK, let us know how it goes. Avoiding duplicate values by using smart pointers has the potential to be a really big memory saver, at least. I often prefer std::map to hash map since logarithmic performance is almost always fast enough, and you get guaranteed performance 100% of the time (e.g. no rehashing etc), which is essential for hard real-time systems. – Erik Alapää Aug 22 '16 at 10:06
  • @hyde That's not necessarily the case. The hash may be implemented using a non-hierarchical data structure internally, and in some cases it might actually be inferior to the R&B of map. – Yam Marcovic Aug 22 '16 at 10:08
  • If you have just a couple of thousand of values, you can just put them into a vector and change your map from key->value to key->index. That would safe a lot of allocations if the values are to big for SSO. If there is some 128bit integer type in MSVC, you can also use that as the key type instead which might help hashing. – Baum mit Augen Aug 22 '16 at 10:08
  • @BaummitAugen The problem with that is that I would still need to do value->index conversion, which would require another map – Eyal K. Aug 22 '16 at 10:10
  • 1
    @EyalK. Yes, but the map need not make millions of copies of value-strings, but millions of copies of value-integers. The latter is rather cheap. – Baum mit Augen Aug 22 '16 at 10:11
  • @YamMarcovic I believe C++ standard requires `unordered_map` average time complexity of O(1). Of course for suitable (intentionally crafted) data, it could be O(N) worst case. And of course the overhead is different, so for suitably small N, `std::map` may well be faster. However, ultimately, O(1) will trump O(log N), and 20000000 keys like here is probably well into that territory. – hyde Aug 22 '16 at 10:13
  • @hyde log(20000000) is less than 17, so probably not. At least that is not big enough for a blind assertion. – Baum mit Augen Aug 22 '16 at 10:15
  • @BaummitAugen Good point, I will try that as well – Eyal K. Aug 22 '16 at 10:15
  • @hyde For the *average* case, yes, but this might not be your average case. From the docs of `::at()`: "**Complexity:** Average case: constant, worst case: linear in size." – Yam Marcovic Aug 22 '16 at 10:17
  • "My profiling ... only shows where the program spends the most time. There is a lot of file I/O as well so things take a long time." Are you sure that the profiling is telling you the truth then? If there is a lot of file io going on, that may well be dominant - and nothing you do to the hashing will make any difference. – Martin Bonner supports Monica Aug 22 '16 at 10:18
  • @BaummitAugen For log2 that is actually 24. So 24 string comparisons (at least last of which will necessarily have to compare entire string) versus single hash value calculation and a few comparisons to find the value in the bucket, depending on load factor. So I would say it is *well* in the territory where hash table will outperform a tree. – hyde Aug 22 '16 at 10:20
  • @MartinBonner I'm looking at averages between different runs, and the hashing takes a long enough portion of the CPU time to make it very noticeable. – Eyal K. Aug 22 '16 at 10:23
  • @ErikAlapää I changed my keys to std::array but now the compiler is telling me that there is no standard defined hash for this type. Any suggestions here? I've never rolled my own hash function before – Eyal K. Aug 22 '16 at 10:24
  • @hyde The keys are short enough for SSO, so comparing them is probably cheap. Not saying the map is certainly faster, but I am not as confident calling one the winner as you are. If you safe a couple of cache misses, that may already do it. – Baum mit Augen Aug 22 '16 at 10:24
  • @EyalK. Boost has you covered on that one. http://www.boost.org/doc/libs/1_61_0/doc/html/hash/reference.html – Baum mit Augen Aug 22 '16 at 10:26
  • 1
    Actually, I just noticed that you keys fit into 64bit ints as `log2(36^12) < 64`. So if you can come up with a fast injektive function keystring (or `char[12]`) -> `uint64_t`, your hash will become identity and the aforementioned function may be faster than `std::hash` because it can use the size information. That would also make `std::map` faster as integer comparisons are dirt cheap, so you could try that as well. – Baum mit Augen Aug 22 '16 at 10:32
  • @BaummitAugen Not saying it's not worth testing, but in a tree map, basically every tree node lookup is a cache miss (at least when size is 20M keys), so being able to compare just integers won't help that much. Anyway, found some benchmark results: http://stackoverflow.com/a/25027750/1717300 – hyde Aug 22 '16 at 10:55
  • @EyalK.Maybe you can cook something together by combining these 2 links: http://en.cppreference.com/w/cpp/utility/hash and http://www.cse.yorku.ca/~oz/hash.html – Erik Alapää Aug 22 '16 at 11:15
  • @BaummitAugen Used your suggestions, have updated the original post accordingly – Eyal K. Aug 22 '16 at 11:23
  • 1
    @EyalK. When you're building up the hash table, don't use the index operator. Instead, use insert. Even better would be to call emplace if that's available. You should get some performance better gains because it'll reduce the number allocation/deallocation calls occurring within the implmentation. – Andrew Aug 22 '16 at 12:38
  • @EyalK. Also, try calling reserve before inserting. This should help reduce the internal copying as your hash table grows. The input should be a value that's at least the number of expected elements in the table when fully populated. I personally try to pick a prime number larger than at least 1 sigma greater (no other reason than preference and math nerdiness), but a good value should be picked from testing and analysis of your data. – Andrew Aug 22 '16 at 12:53
  • @Andrew emplace does seem to make things a bit faster. I tried calling reserve before but it didn't make any noticeable difference. – Eyal K. Aug 22 '16 at 13:22

2 Answers2

4

Two unrelated suggestions, but both related to std::unordered_map::reserve.


First, since your unordered map contains 10Ms of elements, there are probably many re-allocations/rehashes going on as you insert. At the start, you might want to reserve 10Ms of entries.


Since

the mapped values are from a set of a few thousand strings

you should be able to store the values themselves in a secondary unordered_set that you first reserved to something large enough to ensure no iterators get invalidated on inserts - see invalidation guarantees for unordered associative containers.

Your (primary) unordered_map can then map strings to std::unordered_set::const_iterators.

Community
  • 1
  • 1
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • I tried using reserve but it made no noticeable difference. After reading the docs, it seems this is effectively the same as the previous answer to change the bucket count. I don't think key hashes are clashing, otherwise I would expect to see a difference. I will try to change the value type and see if that does anything – Eyal K. Aug 22 '16 at 10:34
  • You are probably better of with a secondary vector instead of unordered map and map to indices instead. It safes the map-internal indirection and a vector of a couple thousand strings easily fits into L3 cache (and maybe in L2 if you are lucky) on many processors. Needs measurement though of course. – Baum mit Augen Aug 22 '16 at 10:42
  • @BaummitAugen That's an interesting suggestion, thanks (tree with eyes? I hardly speak German). – Ami Tavory Aug 22 '16 at 11:00
  • *"tree with eyes"* That's exactly what it means. :) – Baum mit Augen Aug 22 '16 at 11:38
0

For such a large number of entries, you might want to consider playing around with the number of buckets in your hash.

For starters, you can query your implementation-defined value with this:

unordered_map<T, U> um; cout << um.bucket_count();

Then you can play around and see which value produces the best results:

const size_t N = 100; unordered_map<T, U> m(N);
Yam Marcovic
  • 7,953
  • 1
  • 28
  • 38
  • The implementation defined bucket count is 8. Changing it to 100 didn't make any noticeable difference. – Eyal K. Aug 22 '16 at 10:07