How to dynamically optimize order of fields in multi-field comparison

Question

I have a std::map in which the keys are objects with many (sometimes a dozen or more) fields of various types. The map uses the object's operator< to compare the objects.

I have to choose the order in which the many fields get processed in operator<. If I choose wisely, then the outcome of operator< can usually be decided based on the first few processed fields. If I choose poorly, then operator< may usually need to look at most fields before it knows the outcome. I need to process vast numbers of such objects, so the choice of the order of the comparisons might make a large difference to how much time gets spent in evaluating operator<. There is no guarantee that whichever order is best for today's batch of objects will still be the best for tomorrow's batch.

Is there a way to dynamically adjust the order of the comparisons in operator< based on the measured cost of recent evaluations?

I thought of something based on a vector of "pointers to data member", but that seems to require that all fields be of the same type, which usually isn't the case for my objects.

Clarification (2021-10-25)

Suppose that the keys are objects with 10 fields, and that in the data the first 9 fields that are inspected by operator< happen to have the same value in all objects that occur in the data. Then the identity (and order) of the objects is in practice decided by the 10th field. We incur the cost of inspecting all 10 fields for every operator< call.

Now let's rearrange the order in which the fields are inspected by operator< by making the formerly 10th field be inspected first. If the two objects offered to operator< differ in that newly first field, then inspecting that field is sufficient to determine the return value of operator<, and the other 9 fields don't need to be inspected at all, so the cost of evaluating operator< is less.

So, it is desirable to be able to optimize the order in which operator< inspects the fields so the cost is minimized.

It may be prohibitively expensive to change the order of the keys in a std::map while it is in use, but it would already help to be able to deduce the optimal (or at least a better) order during one run and then use that new order during the next run, on the assumption that the optimal order for that next run is similar to the optimal order for the previous run.

463035818_is_not_an_ai · Answer 1 · 2021-10-01T11:39:51.810

0

For comparing several members (only 2 in the example) std::tie is very handy:

bool compare(const foo& a, const foo& b){
    return std::tie(a.x,a.y) < std::tie(b.x,b.y);
}

This can be made more generic by using member pointers:

#include <tuple>
#include <iostream>
struct foo {
    int x = 0;
    double y = 42;
};

bool compare(const foo& a, const foo& b){
    return std::tie(a.x,a.y) < std::tie(b.x,b.y);
}

template <auto ...m>
bool compare(const foo& a, const foo& b){
    return std::tie( ((a.*m),...)) < std::tie(((b.*m),...));
}

int main() {
    foo f1{1,2};
    foo f2{2,1};
    std::cout << compare<&foo::x,&foo::y>(f1,f2) << "\n";
    std::cout << compare<&foo::y,&foo::x>(f1,f2) << "\n";
}

Output:

0
1

The template parameters must be known at compile time, but you can map between runtime and compile time with some boilerplate:

auto select_comp() {
     if (x is most likely different) return  compare<&foo::x,&foo::y>(f1,f2);
     else if (y changes most) return compare<&foo::y,&foo::x>(f1,f2);
     // ...
}

but that seems to require that all fields be of the same type, which usually isn't the case for my objects.

auto template parameters are available since C++17. Before it was possible only a little more complicated (see eg https://stackoverflow.com/a/38044251/4117728).

PS: Actually I am not sure if I understood the question, because changing the order of members not only makes the comparison return earlier or later, but it also changes the result. What you describe seems to fit for ==, where order typically does not change the result.

edited Oct 01 '21 at 11:39

answered Oct 01 '21 at 11:34

463035818_is_not_an_ai

109,796
11
89
185

Changing the order of the members changes the order of the keys, but I don't mind about that. In the end the values get dumped to a database, and the order in which they are dumped is unimportant. My problem is that I don't know beforehand which "x is most likely different" and which "y changes most" after that. – Louis Strous Oct 25 '21 at 10:08
@LouisStrous it is not possible to change the comparator of a `std::map` dynamically. That would break the class invariant. If order doesn't matter why use a `std::map` in the first place? If you do use a `std::unordered_map` you could have a dynamically selfadjusting equality comparison (because result of `a == b` does not depend on order of members) – 463035818_is_not_an_ai Oct 25 '21 at 10:40
If I use a `std::unordered_map` then I must provide a hash function that includes all members every time. There is no room for sometimes ignoring some fields there. The possibly self-adjusting `operator==` only gets used if there is a hash collision -- which for a good hash function means that the two objects being compared are nearly always equivalent, in which case `operator==` would also need to look at all members, and the order of the comparisons makes no difference. – Louis Strous Oct 25 '21 at 11:04
@LouisStrous actually I wasnt sure how it would interfer with hashing. If order doesnt matter how about good ol' `std::vector` then? I don't quite understand why you need to compare. Because elements must be unique? – 463035818_is_not_an_ai Oct 25 '21 at 11:27
I need to aggregate statistics for each unique key, and the possible range of keys is effectively infinite. A `std::vector` is not suitable then. An associative array such as a `std::map` is suitable but requires a comparison operation such as `operator<` for its keys. – Louis Strous Oct 26 '21 at 14:40

How to dynamically optimize order of fields in multi-field comparison

Clarification (2021-10-25)

1 Answers1