I have a set of N
customers, indexed 0,...,N-1
. Periodically, for some subset S
of customers, I need to evaluate a function f(S)
. Computing f(S)
is of linear complexity in |S|
. The set S
of customers is represented as an object of type std::vector<int>
. The subsets that come up for evaluation can be of different size each time. [Since the order of customers in S
does not matter, the set can as well be represented as an object of type std::set<int>
or std::unordered_set<int>
.]
In the underlying application, I may have the same subset S
of customers come up multiple times for evaluation of f(S)
. Instead of incurring the needless linear complexity each time, I am looking to see if it would benefit from some sort of less computational burdensome lookup.
I am considering having a map of key-value pairs where the key is directly the vector of customers, std::vector<int> S
and the value mapped to this key is f(S)
. That way, I am hoping that I can first check to see if a key already exists in the map, and if it does, I can look it up without having to compute f(.)
again.
Having an std::map
with std::vector
as keys is well-defined. See, for e.g., here.
CPPReference indicates that map lookup time is logarithmic. But I suppose this is logarithmic in the number of key
s where each key if of a constant length -- such as an int
or a double
, etc. How is the complexity affected where the key itself need not be of constant length and can be of arbitrary length upto size N
?
Since the keys can themselves be of different sizes (subset of customers that come up for evaluation could be different each time), does this introduce any additional complexity in computing a hash function or the compare operation for the std::map
? Is there any benefit to maintain the key as a binary array a fixed length N
? This binary array is such that B_S[i]=1
if the i
th customer is in set S
, and it is 0 otherwise. Does this make the lookup any easier?
I am aware that ultimately the design choice between reevaluating f(S)
each time versus using std::map
would have to be done based on actual profiling of my application. However, before implementing both ideas (the std::map
route is more difficult to code in my underlying application), I would like to know if there are any known pre-existing best-practices / benchmarks.