3

Suppose I have a set (or map) of strings, and I want to use a custom comparator that compares only the first 5 characters. So "abcde" and "abcdef" are the same in my set.

typedef std::set<std::string, Cmp> MySet;

What is the best way to write Cmp?

The obvious way is like this:

struct Cmp
{
    bool operator()(const string& x, const string& y)
    {
        return (x.substr(0, 5) < y.substr(0, 5));
    }
}

The problem is that this code repeats .substr(0, 5). In this example it's pretty short, but in the general case it could be longer. I want to avoid this repeating code.

In general, given the types T1, T2 and the function T2 key(T1& const), I want a set of T1 elements that compares according to key(a) < key(b), where comparison on T2 is already well-defined. What is the best way to write this? I thought about writing a new class KeyBaseSet, but that would be over-designing for my single use-case. Is there some way to do this using std or Boost?

I'm looking for something similar to the key argument when sorting in Python (https://docs.python.org/3/howto/sorting.html#key-functions), or the compare `on` idiom in Haskell (https://stackoverflow.com/a/2788262/351105).

Tomer Vromen
  • 1,762
  • 1
  • 13
  • 13
  • Inexperienced in c++ as I am, pass a pointer to the beginning of each string and use strncmp(x, y, 5) – Marichyasana Jul 05 '20 at 16:03
  • 1
    The biggest problem with your code is that calling `substr` allocates a new string, and doing that twice just to compare them makes comparison a lot more expensive than it should be. @Evg's answer fixes this in a nice way. – Matt Timmermans Jul 05 '20 at 16:13

2 Answers2

3

You can customize Cmp with a key policy. Minimal example:

template<class Key>
struct Compare_on {
    Compare_on(Key key = Key()) : key_(key)
    {}

    template<class T>
    bool operator()(const T& x, const T& y) const {
        return key_(x) < key_(y);
    }

private:
    Key key_;
};

struct First3 {
    std::string_view operator()(const std::string& s) const {
        return std::string_view(s).substr(0, 3);
    }
};

// Example:
std::set<std::string, Compare_on<First3>> set;
set.insert("abc1");
set.insert("abc2");

Demo


Compare_on can be improved by making it a transparent comparator:

template<class Key>
struct Compare_on {
    using is_transparent = void;

    Compare_on(Key key = Key()) : key_(key)
    {}

    template<class T1, class T2>
    bool operator()(const T1& x, const T2& y) const {
        return key_(x) < key_(y);
    }

private:
    Key key_;
};

struct First3 {
    template<class T>
    std::string_view operator()(const T& s) const {
        return std::string_view(s).substr(0, 3);
    }
};

Now when we do

auto pos = set.find("abc");

no temporary std::string will be constructed for the string literal "abc".

Demo 2

Evg
  • 25,259
  • 5
  • 41
  • 83
  • This is a nice solution, and now that you wrote it, I see that it's pretty short as well! BTW, the optimization of not copying the string is nice, but in my question it was just an example - I don't really need this exact key comparison. – Tomer Vromen Jul 05 '20 at 16:58
  • @TomerVromen, depending on your use case, you might still benefit from transparent comparisons if `find` argument type is different from set key type. – Evg Jul 05 '20 at 17:01
0

Your problem could be solved with higher order functions. Essentially a function that takes the T2 key(const T1&) function and returns a new function that can be used as comparator for the std::set. Something like:

auto make_comp(T key) {
    return [key](const auto& v1, const auto& v2){return key(v1) < key(v2);};
}

Obviously this example requires generic lambdas, but it should be possible to achieve this with template structs as well :)

It should be possible to define a set as std::set<decltype(make_comp([](){...}))>.

Marti Nito
  • 697
  • 5
  • 17