C++ Find in a vector of

Question

So previously I only had 1 key I needed to look up, so I was able to use a map:

std::map <int, double> freqMap;

But now I need to look up 2 different keys. I was thinking of using a vector with std::pair i.e.:

std::vector <int, std::pair<int, double>> freqMap;

Eventually I need to look up both keys to find the correct value. Is there a better way to do this, or will this be efficient enough (will have ~3k entries). Also, not sure how to search using the second key (first key in the std::pair). Is there a find for the pair based on the first key? Essentially I can access the first key by:

freqMap[key1]

But not sure how to iterate and find the second key in the pair.

Edit: Ok adding the use case for clarification:

I need to look up a val based on 2 keys, a mux selection and a frequency selection. The raw data looks something like this:

Mux, Freq, Val
0, 1000, 1.1
0, 2000, 2.7
0, 10e9, 1,7
1, 1000, 2.2
1, 2500, 0.8
6, 2000, 2.2

Why do you talk about two *keys* but use `std::pair` for the *value*? When you say you need to look up both keys, do you mean you need to access the value given any one of the keys, or do you need to know both keys to find the value? — flyx, Aug 28 '20 at 16:28
Please show an example of what you want to do. I find the description a bit confusing. — cigien, Aug 28 '20 at 16:31
Yes I need both keys to find the value. I was thinking of using the first element in the pair (int) as the second key, and the double would be the value I want returned. — SSB, Aug 28 '20 at 16:32
Why not make the *Key* a `pair` then? i.e. have a `vector, double>>`. In which case, you could also do `map, double>` — cigien, Aug 28 '20 at 16:33
@cigien Would need implementation of `std::less>` for a `std::map`; I'd rather implement `std::hash` on it and use `std::unordered_map`. — flyx, Aug 28 '20 at 16:36

bitmask · Accepted Answer · 2020-08-28T17:22:39.420

The blanket answer to "which is faster" is generally "you have to benchmark it".

But besides that, you have a number of options. A std::map is more efficient than other data structures on paper, but not necessarily in practice. If you truly are in a situation where this is performance critical (i.e. avoid premature optimisation) try different approaches, as sketched below, and measure the performance you get (memory-wise and cpu-wise).

Instead of using a std::map, consider throwing your data into a struct, give it proper names and store all values in a simple std::vector. If you modify the data only seldom, you can optimise retrieval cost at the expense of additional insertion cost by sorting the vector according to the key you are typically using to find an entry. This will allow you to do binary search, which can be much faster than linear search.

However, linear search can be surprisingly fast on a std::vector because of both cache locality and branch prediction. Both of which you are likely to lose when dealing with a map, unordered_map or (binary searched) sorted vector. So, although O(n) sounds much more scary than, say, O(log n) for map or even O(1) for unordered_map, it can still be faster under the right conditions.

Especially if you discover that you don't have a discernible index member you can use to sort your entries, you will have to either stick to linear search in contiguous memory (i.e. vector) or invest into a doubly indexed data structure (effectively something akin to two maps or two unordered_maps). Having two indexes usually prevents you from using a single map/unordered_map.

If you can pack your data more tightly (i.e. do you need an int or would a std::uint8_t do the job?, do you need a double? etc.) you will amplify cache locality and for only 3k entries you have good chances of an unsorted vector to perform best. Although operations on an std::size_t are typically faster themselves than on smaller types, iterating over contiguous memory usually offsets this effect.

Conclusion: Try an unsorted vector, a sorted vector (+binary search), a map and an unordered_map. Do proper benchmarking (with several repetitions) and pick the fastest one. If it doesn't make a difference pick the one that is the most straight-forward to understand.

Edit: Given your example data, it sounds like the first key has an extremely small domain. As far as I can tell "Mux" seems to be limited to a small number of different values which are near each other, in such a situation you may consider using an std::array as your primary indexing structure and have a suitable lookup structure as your second one. For example:

std::array<std::vector<std::pair<std::uint64_t,double>>,10>
std::array<std::unordered_map<std::uint64_t,double>,10>

Another thing to consider is a use pattern. How often the data is modified and how often is it queried? In other words, what is more important: a cost of insertion or a cost of search? — Vlad Feinstein, Aug 28 '20 at 17:02
@VladFeinstein Yes, I pointed that out, but maybe didn't stress this enough. Let me see how I can improve it. — bitmask, Aug 28 '20 at 17:05
@bitmask thanks for the explanation. You are right, the mux selection will be 0-7 only, so I think I'm going to try out your suggestion of the std::array of maps. Fyi, it's a 1-time hit to insert all the elements, and the search is what will happen thousands of times. — SSB, Aug 28 '20 at 17:48
@SSB In that case, really see if using a vector isn't faster (sorted or unsorted). It sounds counter intuitive, I know, but I have seen it again and again. map and unordered_map are for convenience, array and vector are for performance. — bitmask, Aug 28 '20 at 18:10

C++ Find in a vector of

1 Answers1