Why hash_map and unordered_map on my machine are extremely slow?

Question

I tested them with this code (on Visual Studio 2010 sp1):

#include <ctime>
#include <iostream>
#include <map>
#include <unordered_map>
#include <hash_map>

int main()
{ 
    clock_t time;
    int LOOP = (1 << 16);
    std::map<int, int> my_map;
    std::unordered_map<int, int> map_unordered_map;
    std::hash_map<int, int> my_hash_map;

    time = clock();
    for (int i = 0; i != LOOP; ++i)
    {
        my_map[i] = i;
    }
    std::cout << "map: " << ((double)(clock() - time) / CLOCKS_PER_SEC) << std::endl;

    time = clock();
    for (int i = 0; i != LOOP; ++i)
    {
        map_unordered_map[i] = i;
    }
    std::cout << "unordered_map: " << ((double)(clock() - time) / CLOCKS_PER_SEC) << std::endl;

    time = clock();
    for (int i = 0; i != LOOP; ++i)
    {
        my_hash_map[i] = i;
    }
    std::cout << "hash_map: " << ((double)(clock() - time) / CLOCKS_PER_SEC) << std::endl;

    system("PAUSE");
    return EXIT_SUCCESS;
}

And the result are so strange:

In DEBUG: map: 0.289 unordered_map: 10.738 hash_map: 10.58 Press any key to continue . . .

In RELEASE: map: 0.101 unordered_map: 0.463 hash_map: 0.429 Press any key to continue . . .

It could be that the `std::map` implementation is particularly tuned for increasing key insertion, you should test with random numbers instead. It could also be that 2^16 is too small to show the theoretical advantage of the hash containers. — Mark Ransom, Aug 30 '12 at 16:37
std::map uses a red-black tree as it's internal data structure while std::hash_map uses a hash table. What you are seeing could be the cost of rehasing the hash table as it grows. What happens if you clear them and run the same insert a second time? — Jens Agby, Aug 30 '12 at 16:47
becausee if I set LOOP larger, it would get SOOOO slow, so finally I set it to 1 << 16, so that I could run it over and over to check problems.....just look at the 10 sec result of DEBUG mode..... — hythloday, Aug 30 '12 at 16:52
Jens Agby is right..... if I loop it for a second time after all elements inserted, the hash_map is significantly faster than map..... — hythloday, Aug 30 '12 at 16:59

score 6 · Answer 1 · answered Aug 30 '12 at 16:46

6

You're only inserting 65536 items in each map -- not large enough for the difference between O(log N) and O(1) to mean a whole lot.
You're only inserting items, not doing any searching afterwards.
Your keys are all contiguous integers in increasing order -- doesn't fit how any map will typically be used.

Bottom line: this isn't likely to tell you much about the data structures in question.

answered Aug 30 '12 at 16:46

Jerry Coffin

476,176
80
629
1,111

I changed the code a little bit, first insert all elements, and then calculate the time to search for every element. This time the result fits expectation... Thank you anyway... – hythloday Aug 30 '12 at 17:04

score 1 · Answer 2 · answered Aug 30 '12 at 17:15

1

This is an example of the amortized vs worst case cost of an algorithm.

std::map uses a red-black tree that has a O(logN) insert complexity.
std::hash_map uses a hash table that has a O(1) amortized insert complexity.

However, the hash table has a worst case complexity of O(N) when it has to resize the table and rehash the table.

In your case you end up doing a lot of rehashing, so the hash table insert is hitting it's worst case enough that the tree insert becomes faster - O(N) > O(logN).

If you init the hash_map with a large enough table then the hash table will never hit it's worst case and it will be faster then the tree - O(1) < O(logN).

answered Aug 30 '12 at 17:15

Jens Agby

410
2
9

So, in cases when doing insertion and erase both very often, use map; when load massive data all together, and later read them with a key, use any kind of hash_map, am I right? – hythloday Aug 30 '12 at 17:26
Not really. The performance of `hash_map` depends on the hash function provided. If the hash function does not fit well with your input data, then the performance could be horrible. – timrau Aug 30 '12 at 17:31
@hythloday - `hash_map` should be your default choice if you are just doing insert, lookup and remove. Just make sure you init it to a good size for your data. Yes, there are some considerations with `hash_map` - the hashing function and a bad worst case. So make sure you have a basic understanding of how to get around those. The advantage of the `map` is that it's very cheap to iterate over the data in sorted order. If you don't care about that then `hash_map` should be your first choice. – Jens Agby Aug 31 '12 at 23:18

Why hash_map and unordered_map on my machine are extremely slow?

2 Answers2

Linked