-1

Lookup is to be done either by integer or by string. Both integer and strings are guaranteed to be unique in the collection. The size of the collection is ~500 of those pairs.

The interface of the collection would be 2 functions, one to translate the integer to string and one to translate the string to the corresponding integer.

It seems there is no methodology to pick up the "proper" container.

An idea would be to use std::vector, which is the default and since it will be stored in a contiguous memory we ensure cache hits? But I know the size of the collection at compile time, so std::array?

Or since order is not important but the look up needs to be fast we can use a hash based solution like an std::map? But then what is the key and what is the value? Or one can have 2 std::maps (std::map<int, std::string> and std::map(std::string, int)) with duplicated the information, since anyway the collection is small.

I know that the ultimate answer would be to benchmark it, but I am wondering if there is any actual methodology to know what to pick based on principle.

  • 1
    How often does the collection change? Is it static? If so, just use two lookup tables. – Konrad Rudolph Nov 30 '22 at 09:53
  • 4
    You really only know when you test it. Having two lookup tables may be a worthwhile tradeoff of size for speed. Otherwise maybe for such a small collection, a `std::array` may be fast enough. You can even sort that according to the integers and use binary search for those (and linear search for the strings) – perivesta Nov 30 '22 at 09:54
  • Maybe "boost bimap" or this "https://stackoverflow.com/questions/71215478/c-what-is-the-best-sorting-container-and-approach-for-large-datasets-millions/71361388#71361388" can help you. – A M Nov 30 '22 at 10:01
  • Or, Boost::MultiIndex: https://stackoverflow.com/q/39510143/580083. The difference is described [here](https://www.boost.org/doc/libs/1_79_0/libs/bimap/doc/html/boost_bimap/bimap_and_boost.html#boost_bimap.bimap_and_boost.bimap_and_multiindex). Quoting: _"Store an ID and a name for an employee, with fast search on each member. This type of problem is better modelled as a database table, and Boost.MultiIndex is the preferred choice."_ This seems to match your case pretty well. – Daniel Langr Nov 30 '22 at 10:06
  • related: https://stackoverflow.com/questions/181693/what-are-the-complexity-guarantees-of-the-standard-containers – nick Nov 30 '22 at 10:14
  • "...I am wondering if there is any actual methodology to know what to pick based on principle." that methodology is benchmarking, measuring and profiling. I'd always use `std::vector` as candidate, then try others – 463035818_is_not_an_ai Nov 30 '22 at 10:20
  • @KonradRudolph never. – Paris Moschovakos Nov 30 '22 at 12:12

1 Answers1

0

std::map is a sorted associative container where keys are sorted. Its implemented using height balanced tree where search, removal, and insertion operations have logarithmic complexity.

As key ordering is not required, using std::unordered_map will be more efficient. Unordered map is an associative container where search, insertion, and removal of elements have average constant-time complexity.

Having 2 unordered_maps would give O(1) time for each search operation.

Rama
  • 74
  • 4