-2

I have a question about lookup speed. I want to know which STL container can produce the fastest lookup time in C++. unordered_map comes to my mind since it is implemented by hash map, but I am afraid its performance is penalized because it contains key-value pair, whereas set contains only key. I guess the answer will depends on 1) the data type of key; and 2) the STL implementation of set.

In other words, which container is faster to search for the existence of an key, is it set, unordered_map, or something else?

Edit:

Would appreciate the answer with more explanation on the implementation or mechanism of the container. For instance, unordered_map is fast because it's implemented with hashmap. That will be more helpful than saying "it depends on the need". Thanks!

Steven Shi
  • 19
  • 1
  • 6
  • Depends a lot, even a linear search in `std::vector` might be faster than logarithm search. – Jarod42 Feb 13 '19 at 10:35
  • 3
    You know there's an `unordered_set` too, right? – Sneftel Feb 13 '19 at 10:41
  • None of the standard containers will give you the fastest possible lookup time. There are better implementations of a hash table than STL's unordered containers. And even, if you know your use case, you can optimize your own hash table implementation for that case (or, in specific cases, there can be better suited structures than hash tables). – geza Feb 13 '19 at 10:42

2 Answers2

2

This depends to a large degree on the distribution of your data, the size of your dataset, the compiler, the toolchain...

The only way you can know is to measure it for your use case.

Do this after selecting the appropriate container for your task, then switch to something else only if you find that you need to and that you can get better performance for your use case by doing so.

Based on your question, I'd say the choice is between set and unsorted_set. On the other hand, if you don't actually know yet whether your data have both keys and values, then you're probably not ready to start profiling your solution.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
0

Consider this perspective from a different angle,

Since performance is what you're interested in here, you might want to design your data structure in a way to optimally utilise cacheline

If the number of elements is not too high, then a vector will outperform all other containers. This is the case, because vectors store elements in contiguous memory locations and your cache loves contiguous memory allocation

You also mentioned about key-value pairs hampering lookup speed. One way to get around this from cacheline perspective is to store the keys in a contiguous data structure, do the lookup with keys alone. The only when you have a hit, you might want to read the corresponding value from your key-value pair

Check out this talk by Mike acton for more on this

Rajesh
  • 356
  • 1
  • 5
  • 15