I have a huge array (say, ParticleId[]
) of unique integers (representing particle IDs) stored in random order in memory. I need to build a hash table to map each ID to its location inside the array, i.e., from ID to index. The IDs are not necessarily continuous integers so a simple look-up array is not a good solution.
I am currently using the unordered_map
container of c++11 to achieve this. The map is initialized with a loop:
unordered_map <ParticleId_t, ParticleIndex_t> ParticleHash;
ParticleHash.rehash(NumberOfParticles);
ParticleHash.reserve(NumberOfParticles);
for(ParticleIndex_t i=0;i<NumberOfParticles;i++)
ParticleHash[ParticleId[i]]=i;
The ParticleId_t
and ParticleIndex_t
are simply typedef-ed integers.
NumberOfParticles
can be huge (e.g., 1e9
). ParticleId[]
array and NumberOfParticles
are const
as far as the hash table is concerned.
Currently it takes quite a lot of time to build the unordered_map
as above. My questions are:
- Is
unordered_map
the optimal choice for this problem?- would
map
be faster to initialize, although it may not be as efficient in the look-up?
- would
- Is it possible to speed up the initialization?
- Is it much faster to use
ParticleHash.insert()
thanParticleHash[]=
? or any other functions? - Given that my keys are known to be unique integers, is there a way to optimize the map as well as the insertion?
- Is it much faster to use
I am considering the intel concurrent_unordered_map
to parallelize it. However, that would introduce a dependence on the intel TBB library, which I would like to avoid if possible. Is there an easy solution using the native STL containers?
Update:
Now I've reverted to a plain sorted index table and rely on bsearch
for lookup. At least the initialization of the table is now 20 times faster and can be easily parallelized.