2

I have used (i, j, k) coordinates as key in an unordered_map. But when I insert more keys into it, it becomes much slower in a shocking magnitude (e.g. insert 10x more keys and consumes 100x more time, and even worse). The unordered_map now is the bottleneck of my program, so I want to find out how to improve it. Following is the minimal reproducible example of my problem:

#include <unordered_map>
#include <time.h>

using namespace std;

// define the key
struct key
{
    int i;
    int j;
    int k;
    key(int a,int b,int c): i(a),j(b),k(c){}
    bool operator==(const key& other_key) const{
        return i==other_key.i && j==other_key.j && k==other_key.k;
    }
};

// define a simple hash function for unordered_map
struct key_hash
{
    size_t operator()(const key& o) const{
        return o.i ^ o.j ^ o.k;
    }
};

// define unordered_map
unordered_map<key,int,key_hash> memory;

int main(int arg,char **argv){
    // accept arguments about how many keys should be inserted
    int I=atoi(argv[1]);
    int J=atoi(argv[2]);
    int K=atoi(argv[3]);

    double st=clock();
    for(int i=0;i<I;i++){
        for(int j=0;j<J;j++){
            for(int k=0;k<K;k++){
                memory[{i,j,k}]=i+j+k;
            }
        }
    }
    double et=clock();
    printf("time: %f",(et-st)/CLOCKS_PER_SEC);
}

I try following arguments and the result is:

(I, J, K) time (sec)
(1, 100, 100) 0.013
(10, 100, 100) 1.285
(100,100,100) 327.178

The shocking thing is that when I increase I by 10x, the time increases by 100x and even worse. I also use python to run the same logic and get much more reasonable and better result. Following is the python code:

import time
if __name__=='__main__':
    I=100
    J=100
    K=100
    memory={}
    st=time.time()
    for i in range(I):
        for j in range(J):
            for k in range(K):
                memory[(i,j,k)]=i+j+k
    et=time.time()
    print('time: {}'.format(et-st))

The result of python is in following table:

(I, J, K) time (sec)
(1, 100, 100) 0.006
(10, 100, 100) 0.043
(100,100,100) 0.318

So in python the increase of I and the increase of time is in the same magnitude, which is reasonable. And the time python used is also significantly less.

In conclusion, is my code wrong or is unordered_map not suitable for my purpose? And how can I solve it?

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
Qiu Junyan
  • 31
  • 2
  • 11
    Your hash function makes the same hash for lots of keys – user253751 Jun 15 '22 at 10:35
  • 6
    Yes, bad hash function. Use a better `hash_combine`, see e.g. [this question](https://stackoverflow.com/questions/2590677/how-do-i-combine-hash-values-in-c0x) or `boost::hash_combine`. – user17732522 Jun 15 '22 at 10:41
  • Also [Why is XOR the default way to combine hashes?](https://stackoverflow.com/questions/5889238/why-is-xor-the-default-way-to-combine-hashes). – user17732522 Jun 15 '22 at 10:44
  • 5
    Any permutation of `i`, `j` and `k` results in the same hash. With values up to 100 only the 7 least significant bits are filled resulting in 2^7 possible hashes which results in an average of 10^6 / 2^7 = 7812.5 keys per hash. – fabian Jun 15 '22 at 10:47
  • 4
    (A) above for the hash and (B) was the testing done with optimisation on? Please include the compilation flags in the question. – Richard Critten Jun 15 '22 at 10:51
  • 1
    Because of the bad hash key, performance tends towards O(n) linear search of a vector. You may have better performance with a sorted vector and binary search. – Eljay Jun 15 '22 at 10:59
  • 1
    Usually hash keys are calculated from summing values multiplied by relative prime numbers (which are ideally relative prime to the size of the container). E.g. try 3*i + 7*j + 11*k for your hash function. – Pepijn Kramer Jun 15 '22 at 11:30
  • 3
    The performance has much improved after using good hash function `boost::hash_combine`. Thanks everyone! – Qiu Junyan Jun 15 '22 at 11:40

0 Answers0