I have a std::map
in which the keys are objects with many (sometimes a dozen or more) fields of various types. The map uses the object's operator<
to compare the objects.
I have to choose the order in which the many fields get processed in operator<
. If I choose wisely, then the outcome of operator<
can usually be decided based on the first few processed fields. If I choose poorly, then operator<
may usually need to look at most fields before it knows the outcome. I need to process vast numbers of such objects, so the choice of the order of the comparisons might make a large difference to how much time gets spent in evaluating operator<
. There is no guarantee that whichever order is best for today's batch of objects will still be the best for tomorrow's batch.
Is there a way to dynamically adjust the order of the comparisons in operator<
based on the measured cost of recent evaluations?
I thought of something based on a vector of "pointers to data member", but that seems to require that all fields be of the same type, which usually isn't the case for my objects.
Clarification (2021-10-25)
Suppose that the keys are objects with 10 fields, and that in the data the first 9 fields that are inspected by operator<
happen to have the same value in all objects that occur in the data. Then the identity (and order) of the objects is in practice decided by the 10th field. We incur the cost of inspecting all 10 fields for every operator<
call.
Now let's rearrange the order in which the fields are inspected by operator<
by making the formerly 10th field be inspected first. If the two objects offered to operator<
differ in that newly first field, then inspecting that field is sufficient to determine the return value of operator<
, and the other 9 fields don't need to be inspected at all, so the cost of evaluating operator<
is less.
So, it is desirable to be able to optimize the order in which operator<
inspects the fields so the cost is minimized.
It may be prohibitively expensive to change the order of the keys in a std::map
while it is in use, but it would already help to be able to deduce the optimal (or at least a better) order during one run and then use that new order during the next run, on the assumption that the optimal order for that next run is similar to the optimal order for the previous run.