1

I have a map of uint64_t to vector. The map takes an unusually long time (~70s) to build compared to unordered_map(~4s) for a string input of 4MB in size. Similarly, 120s vs 2400s for a string input of size ~150MB.

Unordered_map is generally faster than google sparse hash, but the difference in time is huge. Sparse hash is ~2 times slower(than unordered map) for a map to int to int.

I fail to understand the reason behind this time difference.

Code snippet:

int main(){
std::string input,reference;

while (getline(cin,input)) {
    reference += input;
    input.clear();
}

cout<<"length of reference = "<<reference.length()<<endl;

google::sparse_hash_map<uint64_t, vector<uint32_t>> m;
//unordered_map<uint64_t, vector<uint32_t>> m;
for (auto it = reference.begin(); it != reference.end(); it++) {
    m[it-reference.begin()].push_back(it-reference.begin());
}
return 0;
}

This is the result of /usr/bin/time on the program

length of reference = 4641652
    Command being timed: "./a.out"
    User time (seconds): 76.06
    System time (seconds): 0.31
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 1:16.43
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 308228
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 230585
    Voluntary context switches: 1
    Involuntary context switches: 2316
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0
user1995120
  • 167
  • 1
  • 9
  • So, the operarion you are timing is storing a key value pair in a map where the key is many mb in size? I ask because I find your question unclear: your code has a loop, and I do not see why a minimal example woukd have a loop if I am right. – Yakk - Adam Nevraumont Oct 04 '15 at 08:49
  • @yakk So the input is a string. I iterate through the string and push every position into the hash. So we have a hash map with as many keys as size of the input string. The value corresponding to each key is a vector. – user1995120 Oct 04 '15 at 13:56
  • 1
    So your data is neither sparse, nor is it hashable. You are using offsets 64 bit as keys. You should be using a vector indexed by offsets instead of either container. – Yakk - Adam Nevraumont Oct 04 '15 at 14:47
  • You are building a map from i to [i] for all i from 0 to 4641651. Are you sure that your code is correct? – Pastafarianist Dec 15 '15 at 23:58
  • Well, yeah. The point is the sparse hash is slow in building a map of int to vector. I dont understand why simple(and useless) map construction is slow. Of course in my application I am doing something more non trivial with the map – user1995120 Dec 16 '15 at 00:06

0 Answers0