Why is my std::unordered_map access time not constant

Question

I wrote some code to test my unordered map performance with a 2 component vector as a key.

std::unordered_map<Vector2i, int> m;                                                                      

for(int i = 0; i < 1000; ++i)                                                                             
    for(int j = 0; j < 1000; ++j)                                                                         
        m[Vector2i(i,j)] = i*j+27*j;                                                                      

clock.restart();                                                                                          

auto found = m.find(Vector2i(0,5));                                                                                                                                                            

std::cout << clock.getElapsedTime().asMicroseconds() << std::endl;

output for the code above: 56 (microseconds) When I replace 1000 in the for loops by 100 the outputs is 2 (microseconds) Isn't the time supposed to be constant ?

hash function for my Vector2i:

namespace std                                                                                                    
{

   template<>                                                                                                   
    struct hash<Vector2i>                                                                                        
    {                                                                                                            
        std::size_t operator()(const Vector2i& k) const                                                          
        {                                                                                                        
            using std::size_t;                                                                                   
            using std::hash;                                                                                     
            using std::string;                                                                                   

            return (hash<int>()(k.x)) ^ (hash<int>()(k.y) << 1);                                                 
        }                                                                                                        

    };                                                                                                           


}

EDIT: I added this code to count the collisions after the for loop:

for (size_t bucket = 0; bucket != m.bucket_count(); ++bucket)                                             
    if (m.bucket_size(bucket) > 1)                                                                        
         ++collisions;

With 100*100 elements: collisions = 256

1000*1000 elements: collisions = 2048

If you have a lot of hash collisions your hash map degenerates into a linked list, maybe thats what happened. — nwp, Dec 30 '15 at 21:36
Try adding some padding to your type so that it occupies an entire cache line. At small sizes, cache locality may be the dominating factor. — Kerrek SB, Dec 30 '15 at 21:36
@nwp: Excellent point; the OP should also output a histogram of all hash values and see if there are any collisions. — Kerrek SB, Dec 30 '15 at 21:37
I have count 256 collisions with the 100 loop, 2048 collisions with the 1000 loop — aqww, Dec 30 '15 at 21:50
@aqww Which is almost a 10-fold increase in the number of collisions... So the time increase for finding is not so surprising anymore, especially if you happen to "hit" one of the elements that have a lot of collisions. — vsoftco, Dec 30 '15 at 21:52
@aqww Yes, which is an art by itself ;) [This (and links within)](http://stackoverflow.com/questions/8513911/how-to-create-a-good-hash-combine-with-64-bit-output-inspired-by-boosthash-co) may be useful. — vsoftco, Dec 30 '15 at 21:53
Which compiler are you using? Visual C++ uses power-of-2 bucket counts which will be worse with such a weak hash function, while GCC uses primes. The number of buckets with collisions is not the most meaningful thing to count - how deep those collision chains get is very important. And, you're only timing a single lookup, so for your current benchmark it's only the `bucket_size` for `m.bucket(Vector2i(0,5))` that's relevant. Anyway, try `boost::hash_combine` or the [algo therefrom](http://stackoverflow.com/questions/29883840/boosthash-combine-vs-simple-xoring?lq=1) - it'll help. — Tony Delroy, Dec 31 '15 at 07:35
@TonyD I use GCC compiler. And you're right my collision counting is not good, a better way would be collisions += bucket_size - 1 for every bucket. hash_combine as suggested by the answer fixed my problem. (now I have almost a constant time) — aqww, Jan 01 '16 at 01:01

score 4 · Accepted Answer · edited May 23 '17 at 12:15

4

A hash table guarantees constant amortized time. If the hash table is well balanced (i.e., the hash function is good), then most elements will be evenly distributed. However, if the hash function is not so good, you may have lots of collisions, in which case to access an element you'd need to traverse usually a linked list (where you store the elements that collided). So make sure first the load factor and hash function are OK in your case. Lastly, make sure you compiler your code in release mode, with optimizations turned on (e.g. -O3 for g++/clang++).

This question may be useful also: How to create a good hash_combine with 64 bit output (inspired by boost::hash_combine).

edited May 23 '17 at 12:15

Community

1
1

answered Dec 30 '15 at 21:36

vsoftco

55,410
12
139
252

2

also 2 and 56 microseconds can't be used as a reliable measure. At this order magnitude of timing, the noise in the OS and hw will be dominant. – bolov Dec 30 '15 at 21:58
@bolov True, one should test for relatively larger times/number of elements, and scale them appropriately. Although if may give you a hint that something may be fishy. – vsoftco Dec 30 '15 at 22:00
I implemented the hash_combine thing, now I have 1 microsecond with 100*100 elements, 3 microseconds with 1000*1000. But something is wrong with the collision number ... It outputs me 258409 collisions for the 1000 loop, 2192 for the 100 loop.. I think I am not counting them as it should .. – aqww Dec 30 '15 at 22:42
@aqww The timing scaling looks much better, although cannot draw a clear conclusion for those super small times, see bolov's comment. Try to increase the number of elements 1000 times and test for a much larger time. Try to use [`std::unordered_map::bucket_size`](http://en.cppreference.com/w/cpp/container/unordered_map/bucket_size) to see how many elements per bucket you have (i.e. collisions). – vsoftco Dec 30 '15 at 22:45
3 microseconds for 10000*10000, 21612599 collisions. Yes I used bucket_size, see my edit I posted the code. – aqww Dec 30 '15 at 22:53
@aqww Well... then I'd say try to find a *random* element in both cases. Because you may end up searching for an element that lives in a very short bucket, so the accessing time is small, even though the collision rate is high. One thing to learn from all of this is that there is no perfect data structure. Every one is good in some particular case, and bad in some others. – vsoftco Dec 30 '15 at 22:55
@aqww see unordered_map::load_factor() method for counting collisions – Anthony Akentiev Dec 30 '15 at 23:00
@AnthonyAkentiev I first thought the same also. Although `load_factor` may trick you. E.g., you may have a load factor of 1, but everything be distributed into one bucket. `load_factor` just counts the sum of bucket counts then divides it by the size of the storage array. I am curious if there is any "good" function (or proposal) to properly count the number of collisions. – vsoftco Dec 30 '15 at 23:01
@vsoftco how come? *load_factor = size / bucket_count* – Anthony Akentiev Dec 30 '15 at 23:03
@AnthonyAkentiev Yes, but what if you have `n` buckets, on an `n` size hash, but only the first bucket is loaded like crazy? Or 1 single bucket, with `n` entries? I may be wrong though, am not super familiar with hash tables. The wikipedia article is quite nice also: https://en.wikipedia.org/wiki/Hash_table#Key_statistics. – vsoftco Dec 30 '15 at 23:06
@AnthonyAkentiev load factor is 0.94668 ( for 1000*1000 elements) – aqww Dec 30 '15 at 23:06
@vsoftco I changed the value of element to find several times, output stays same. – aqww Dec 30 '15 at 23:11
@vsoftco "The load factor is the ratio between the number of elements in the container (its size) and the number of buckets (bucket_count)" i.e. *load_factor = size / bucket_count*. You will get high load_factor if some of your buckets will be "loaded like a crazy". For example: total items=100. bucket_count=2. First bucket contains 99 items, second only 1. load_factor = 100 /2 = 50. Very high. You can't get "empty" buckets. bucket_count is only about occupied. AFAIK – Anthony Akentiev Dec 30 '15 at 23:12
1

@AnthonyAkentiev Sorry, I meant buckets with only 1 element, not empty. Although I agree with you (and also mentioned the load factor in my answer), that in general is a pretty good indicator. – vsoftco Dec 30 '15 at 23:13
My load factor returns exactly the same value(0.94668) for my first hash function (with 1000*1000 elements), but the number of collisions are different. weird... – aqww Dec 30 '15 at 23:33
@aqww That's because load factor is not 100% in agreement with collision number. But I'd say this is a very good opportunity to learn how those hash maps work, and what their limitations are. Try testing, plotting, then drawing conclusions. You may even try to contribute a self-answered question on StackOverflow if you feel you get the testing right, it will be probably quite useful. – vsoftco Dec 30 '15 at 23:35

Why is my std::unordered_map access time not constant

1 Answers1