I try to profile a CPU-intensive part of my program.
It stunned me, unordered_map's isContain, insert, and remove are really slow.
Here is a real example of result from profiling:-
Physic_Object* cave = ... it is a function argument ...
if(cave->id==3){ //< 0.1% simple int
int asdfdssd=0;
}
//"concaves" is MyEncapsulatorOf_unordered_map<Physic_Object*> (it is a field)
//There are about 500-1000 elements in the unordered_map.
if (phy_concaves.isContain(cave)) { //0.8% unordered_map "isContain"
phy_concaves.remove_immediate(cave); //4.9% unordered_map "iscontain" + "erase"
cave->finalize_inverse(); //6.0% a lot of unordered_map function
delete cave; //0.7% simple delete
}else {
}
Here is my cheap hash function.
//Physic_Object.h
public:virtual long hqHashCode() const override final {
return id;
//id is +1 every time when new Physic_Object is create
//I have a static int "id_runner" to track.
}
From the whole profile, unordered_map's operations is gobbling about 70%-80% of my game.
Question: Is unordered_map really slow like that, or is there anything wrong about my code?
More specifically, is it common that insertion/deletion cost around 40+ simple arithmetic operation?
Here is how I create unordered_map, I don't think there is anything wrong, but provided just in case.
//MyEncapsulatorOf_unordered_map<K> , K can be pointer or value
template<typename T2> static const
T2 * ptr(T2 const& obj) { return &obj; } //turn reference into pointer!
template<typename T2> static
T2 * ptr(T2 * obj) { return obj; } //obj is already pointer, return it!
template< class T2>class structTPtr1 {
public:
std::size_t operator()(T2 const& t) const{
std::size_t h1 = ptr(t)->hqHashCode();
return h1;
}
};
template< class T2>class structTPtr2 {
public: bool operator() (const T2 & t1, const T2&t2) const {
return t1 == t2;
}
};
public:std::unordered_set<K, structTPtr1<K>, structTPtr2<K>>main;
Note that I ran in debug mode from visual c++, but release mode yields a similar result.
Why Unordered_Map hashing is too slow then my simple array hash seems to be the most related question, but it doesn't indicate/profile performance of unordered_map.
I don't think it is about gcc either. Therefore this link may not relate. Is gcc std::unordered_map implementation slow? If so - why?