60

I came across this function here.

I am baffled as to how this would be implemented -- how does the key function generated by cmp_to_key know what "position" a given element should be without checking how the given element compares with every other element of interest?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
math4tots
  • 8,540
  • 14
  • 58
  • 95

1 Answers1

78

The cmp_to_key method returns a special object that acts as a surrogate key:

class K(object):
    __slots__ = ['obj']
    def __init__(self, obj, *args):
        self.obj = obj
    def __lt__(self, other):
        return mycmp(self.obj, other.obj) < 0
    def __gt__(self, other):
        return mycmp(self.obj, other.obj) > 0
    def __eq__(self, other):
        return mycmp(self.obj, other.obj) == 0
    def __le__(self, other):
        return mycmp(self.obj, other.obj) <= 0
    def __ge__(self, other):
        return mycmp(self.obj, other.obj) >= 0
    def __ne__(self, other):
        return mycmp(self.obj, other.obj) != 0
    def __hash__(self):
        raise TypeError('hash not implemented')

When sorting, each key will get compared to most other keys in the sequence. Is this element at position 0 lower than or greater than that other object?

Whenever that happens, the special method hooks are invoked, so __lt__ or __gt__ is called, which the surrogate key turns into a call to the cmp method instead.

So the list [1, 2, 3] is sorted as [K(1), K(2), K(3)], and if, say, K(1) is compared with K(2) to see if K(1) is lower, then K(1).__lt__(K(2)) is called, which is translated to mycmp(1, 2) < 0.

This is how the old cmp method was working anyway; return -1, 0 or 1 depending on wether the first argument is lower than, equal to or greater than the second argument. The surrogate key translates those numbers back to boolean values for the comparison operators.

At no point does the surrogate key need to know anything about absolute positions. It only needs to know about one other object it is being compared with, and the special method hooks provide that other object.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Wow, an actual instance were Java has a much simpler solution to the same problem than Python. There aren't many. – jeremyjjbrown Aug 26 '14 at 13:54
  • 4
    @jeremyjjbrown: the `cmp_to_key` function only exists to support legacy Python 2 sorting codebases. The simpler pythonic method is to use a `key` function for `sorted()` instead, and if the sorting algorithm call for a `cmp()`-like approach, implement custom rich comparison hooks directly. – Martijn Pieters Aug 26 '14 at 14:00
  • 14
    Sometimes it's very difficult to come up with a `key` but easy to come up with a compare function... e.g., I want to sort a list of numbers so that all odd numbers come first, in ascending order, and then all even numbers come next, in descending order. – rlbond Jul 17 '15 at 00:42
  • 1
    @rlbond return a tuple with `(n % 2 == 0, (n % 2 and 1 or -1) * n)` for the key then. – Martijn Pieters Jul 17 '15 at 07:29
  • 6
    I am aware that you can make a key for that problem. But I could easily come up with something more difficult. – rlbond Jul 17 '15 at 17:11
  • 2
    @rlbond: then create objects that define their own ordering; you can use those as the key. That includes defining your own rich comparison hooks, as the `K` class used by `cmp_to_key()` does. – Martijn Pieters Jul 17 '15 at 17:33
  • 1
    Come to this post after I noticed the cmp_to_key function in python doc. Just want to point out that the key function generated like this doesn't have the key benefit of key parameter: to avoid computing the "key" multiple times for an element. – user49117 Mar 11 '16 at 13:18
  • 1
    @user49117: The key is computed just *once* for each element, after which the key's rich comparison hooks will be used exactly as often as a `cmp` function in Python 2 would have been used. Yes, if you can it is better to use a [Swartzian transform](https://en.wikipedia.org/wiki/Schwartzian_transform) by specifying a straight-up sort key function instead, but it is no more inefficient than using the `cmp` callback in a Python 2 sort would have been. – Martijn Pieters Mar 11 '16 at 13:20
  • Thanks for the great explanation. I have a question that does Python create a new list [K(1), K(2), K(3)] indeed? – roachsinai Nov 19 '19 at 16:31
  • 1
    @roachsinai: When using the `key` argument to `sorted()` and `list.sort()`, an *array* (a fixed-size C-level data structure) of `key(value)` entries is created, yes. For `cmp_to_key(value)` returns `K(value)` so that's one `K()` object for every value in the input sequence being sorted. – Martijn Pieters Nov 21 '19 at 15:13