4

I tested them with this code (on Visual Studio 2010 sp1):

#include <ctime>
#include <iostream>
#include <map>
#include <unordered_map>
#include <hash_map>

int main()
{ 
    clock_t time;
    int LOOP = (1 << 16);
    std::map<int, int> my_map;
    std::unordered_map<int, int> map_unordered_map;
    std::hash_map<int, int> my_hash_map;

    time = clock();
    for (int i = 0; i != LOOP; ++i)
    {
        my_map[i] = i;
    }
    std::cout << "map: " << ((double)(clock() - time) / CLOCKS_PER_SEC) << std::endl;

    time = clock();
    for (int i = 0; i != LOOP; ++i)
    {
        map_unordered_map[i] = i;
    }
    std::cout << "unordered_map: " << ((double)(clock() - time) / CLOCKS_PER_SEC) << std::endl;

    time = clock();
    for (int i = 0; i != LOOP; ++i)
    {
        my_hash_map[i] = i;
    }
    std::cout << "hash_map: " << ((double)(clock() - time) / CLOCKS_PER_SEC) << std::endl;

    system("PAUSE");
    return EXIT_SUCCESS;
}

And the result are so strange:

In DEBUG: map: 0.289 unordered_map: 10.738 hash_map: 10.58 Press any key to continue . . .

In RELEASE: map: 0.101 unordered_map: 0.463 hash_map: 0.429 Press any key to continue . . .

Mat
  • 202,337
  • 40
  • 393
  • 406
hythloday
  • 103
  • 6
  • 1
    It could be that the `std::map` implementation is particularly tuned for increasing key insertion, you should test with random numbers instead. It could also be that 2^16 is too small to show the theoretical advantage of the hash containers. – Mark Ransom Aug 30 '12 at 16:37
  • std::map uses a red-black tree as it's internal data structure while std::hash_map uses a hash table. What you are seeing could be the cost of rehasing the hash table as it grows. What happens if you clear them and run the same insert a second time? – Jens Agby Aug 30 '12 at 16:47
  • becausee if I set LOOP larger, it would get SOOOO slow, so finally I set it to 1 << 16, so that I could run it over and over to check problems.....just look at the 10 sec result of DEBUG mode..... – hythloday Aug 30 '12 at 16:52
  • Jens Agby is right..... if I loop it for a second time after all elements inserted, the hash_map is significantly faster than map..... – hythloday Aug 30 '12 at 16:59
  • Added an answer describing our findings – Jens Agby Aug 30 '12 at 17:16

2 Answers2

6
  1. You're only inserting 65536 items in each map -- not large enough for the difference between O(log N) and O(1) to mean a whole lot.
  2. You're only inserting items, not doing any searching afterwards.
  3. Your keys are all contiguous integers in increasing order -- doesn't fit how any map will typically be used.

Bottom line: this isn't likely to tell you much about the data structures in question.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • I changed the code a little bit, first insert all elements, and then calculate the time to search for every element. This time the result fits expectation... Thank you anyway... – hythloday Aug 30 '12 at 17:04
1

This is an example of the amortized vs worst case cost of an algorithm.

std::map uses a red-black tree that has a O(logN) insert complexity.
std::hash_map uses a hash table that has a O(1) amortized insert complexity.

However, the hash table has a worst case complexity of O(N) when it has to resize the table and rehash the table.

In your case you end up doing a lot of rehashing, so the hash table insert is hitting it's worst case enough that the tree insert becomes faster - O(N) > O(logN).

If you init the hash_map with a large enough table then the hash table will never hit it's worst case and it will be faster then the tree - O(1) < O(logN).

Jens Agby
  • 410
  • 2
  • 9
  • So, in cases when doing insertion and erase both very often, use map; when load massive data all together, and later read them with a key, use any kind of hash_map, am I right? – hythloday Aug 30 '12 at 17:26
  • Not really. The performance of `hash_map` depends on the hash function provided. If the hash function does not fit well with your input data, then the performance could be horrible. – timrau Aug 30 '12 at 17:31
  • @hythloday - `hash_map` should be your default choice if you are just doing insert, lookup and remove. Just make sure you init it to a good size for your data. Yes, there are some considerations with `hash_map` - the hashing function and a bad worst case. So make sure you have a basic understanding of how to get around those. The advantage of the `map` is that it's very cheap to iterate over the data in sorted order. If you don't care about that then `hash_map` should be your first choice. – Jens Agby Aug 31 '12 at 23:18