8

In Python, given a list, I can sort it by a key function, e.g.:

>>> def get_value(k):
...     print "heavy computation for", k
...     return {"a": 100, "b": 30, "c": 50, "d": 0}[k]
...
>>> items = ['a', 'b', 'c', 'd']
>>> items.sort(key=get_value)
heavy computation for a
heavy computation for b
heavy computation for c
heavy computation for d
>>> items
['d', 'b', 'c', 'a']

As you see, the list was sorted not alphanumerically but by the return value of get_value().

Is there an equivalent in C++? std::sort() only allows me to provide a custom comparator (equivalent of Python's items.sort(cmp=...)), not a key function. If not, is there any well-tested, efficient, publicly available implementation of the equivalent I can drop into my code?

Note that the Python version only calls the key function once per element, not twice per comparison.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Claudiu
  • 224,032
  • 165
  • 485
  • 680

2 Answers2

4

You could just roll your own:

template <typename RandomIt, typename KeyFunc>
void sort_by_key(RandomIt first, RandomIt last, KeyFunc func) 
{
    using Value = decltype(*first);
    std::sort(first, last, [=](const ValueType& a, const ValueType& b) {
        return func(a) < func(b);
    });
}

If KeyFunc is too expensive, you'll have to create a separate vector with the values.

We can even hack together a class that will allow us to still use std::sort:

template <typename RandomIter, typename KeyFunc>
void sort_by_key(RandomIter first, RandomIter last, KeyFunc func)
{
    using KeyT = decltype(func(*first));
    using ValueT = typename std::remove_reference<decltype(*first)>::type;

    struct Pair {
        KeyT key;
        RandomIter iter;
        boost::optional<ValueT> value;

        Pair(const KeyT& key, const RandomIter& iter)
            : key(key), iter(iter)
        { }

        Pair(Pair&& rhs)
            : key(std::move(rhs.key))
            , iter(rhs.iter)
            , value(std::move(*(rhs.iter)))
        { }

        Pair& operator=(Pair&& rhs) {
            key = std::move(rhs.key);
            *iter = std::move(rhs.value ? *rhs.value : *rhs.iter);
            value = boost::none;
            return *this;
        }

        bool operator<(const Pair& rhs) const {
            return key < rhs.key;
        }
    };

    std::vector<Pair> ordering;
    ordering.reserve(last - first);

    for (; first != last; ++first) {
        ordering.emplace_back(func(*first), first);
    }

    std::sort(ordering.begin(), ordering.end());
}

Or, if that's too hacky, here's my original solution, which requires us to write our own sort

template <typename RandomIt, typename KeyFunc>
void sort_by_key_2(RandomIt first, RandomIt last, KeyFunc func)
{
    using KeyT = decltype(func(*first));
    std::vector<std::pair<KeyT, RandomIt> > ordering;
    ordering.reserve(last - first);

    for (; first != last; ++first) {
        ordering.emplace_back(func(*first), first);
    }

    // now sort this vector by the ordering - we're going
    // to sort ordering, but each swap has to do iter_swap too
    quicksort_with_benefits(ordering, 0, ordering.size());
}

Although now we have to reimplement quicksort:

template <typename Key, typename Iter>
void quicksort_with_benefits(std::vector<std::pair<Key,Iter>>& A, size_t p, size_t q) {
    if (p < q) {
        size_t r = partition_with_benefits(A, p, q);
        quicksort_with_benefits(A, p, r);
        quicksort_with_benefits(A, r+1, q);
    }
}

template <typename Key, typename Iter>
size_t partition_with_benefits(std::vector<std::pair<Key,Iter>>& A, size_t p, size_t q) {
    auto key = A[p].first;
    size_t i = p;
    for (size_t j = p+1; j < q; ++j) {
        if (A[j].first < key) {
            ++i;
            std::swap(A[i].first, A[j].first);
            std::iter_swap(A[i].second, A[j].second);
        }
    }

    if (i != p) {
        std::swap(A[i].first, A[p].first);
        std::iter_swap(A[i].second, A[p].second);
    }
    return i;
}

Which, given a simple example:

int main()
{
    std::vector<int> v = {-2, 10, 4, 12, -1, -25};

    std::sort(v.begin(), v.end());
    print(v); // -25 -2 -1 4 10 12

    sort_by_key_2(v.begin(), v.end(), [](int i) { return i*i; }); 
    print(v); // -1 -2 4 10 12 -25
}
Barry
  • 286,269
  • 29
  • 621
  • 977
  • 3
    Yep, but that's not very efficient if `func` computes something heavy. `func` here gets called twice per comparison, instead of once per element as it does in Python's version (I'll update the question to mention this) – Claudiu Apr 09 '15 at 20:41
  • 3
    @Claudiu It's a tradeoff between CPU and memory usage, and you can't get both. If the key function is lightweight, Barry's approach wins because it requires no additional memory. If the key function is heavyweight, Python's approach is better, at the cost of having to allocate another list for the precomputed keys. Python's approach can be emulated in C++ by computing an intermediate vector of keys applied to list elements, and comparing by indices in that vector. – user4815162342 Apr 09 '15 at 20:46
  • 1
    The edited answer contains an implementation of quicksort, which would indicate that it is in fact not possible to implement a Python-style key-based sort **in terms of** `std::sort`, and retain the genericity of the latter. The interface of `std::sort` is restricting in that it passes references to values instead of the actual iterators (which could be used to look up the keys in the vector of precomputed keys). – user4815162342 Apr 09 '15 at 20:57
  • @Claudiu Best way to only compute `func` once relies upon (1) making a `vector` of `pair`s and (2) implementing your own `sort` method. It works. Whether or not it's better depends entirely on how expensive `KeyFunc` is. – Barry Apr 09 '15 at 21:09
  • Interesting - +1 for that implementation. Can you go into why you can't re-use std::sort? Could you, say, make an additional map of iterators to their final positions, then pass a comparator to `std::sort` which looks up the iterator in that map? It would be additional overhead, true, but if std::sort is implemented better then it may be worth it. At that point it may be premature optimization being the root of all evil, though. – Claudiu Apr 09 '15 at 21:11
  • @Claudiu the comparator doesn't get the iterator though, it gets the value – Barry Apr 09 '15 at 21:34
  • @Barry: Oh yeah that's right, that's what user4815162342 was saying which I didn't get at first – Claudiu Apr 09 '15 at 21:37
  • @Claudiu Got a version using `std::sort` directly! – Barry Apr 09 '15 at 23:02
  • @Barry: Nice! I'll have to spend some time figuring that one out.. I'm not so good on move semantics yet. But what I gather is that you made it so whenever `std::sort` reorders items, `Pair` as a side-effect also reorders those items in the initially passed-in list? – Claudiu Apr 09 '15 at 23:12
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/74947/discussion-between-claudiu-and-barry). – Claudiu Apr 10 '15 at 16:10
2

If the key type is not terribly huge (if it is, measure I'd say), you can just save an

std::vector< std::pair<key_type, value_type>> vec;

instead of your "normal" value vector. You can then compute and safe the keys exactly once and then simply use std::sort.

Another, but intrusive method would be providing the key as a member and then caching it. This would have the advantage that you do not need to mess with the pairs every time you access your vector.

Baum mit Augen
  • 49,044
  • 25
  • 144
  • 182