How to convert a Python 2 comparison function to a Python 3 key function?

Question

In Python 2, there is a comparison function.

A comparison function is any callable that accept two arguments, compares them, and returns a negative number for less-than, zero for equality, or a positive number for greater-than.

In Python 3, the comparison function is replaced with a key function.

A key function is a callable that accepts one argument and returns another value to be used as the sort key.

Now, I've a list of tuple[int, int, str] that I want to sort, and the string can be S or E. There are some tie breaker rules that use values from two tuples.

Given a tuple x: int, y: int, z: str, the rules are as follows:

If x are equal, and both z = S, larger y comes first.
If x are equal, and both z = E, smaller y comes first.
If x are equal, and one z = E another z = S, S record comes first.

A Python 2 style implementation is as follows:

def _cmp(coord1: tuple[int, int, str], coord2: tuple[int, int, str]) -> int:
    x1, y1, type1 = coord1
    x2, y2, type2 = coord2
    if x1 != x2:
        return x1 - x2
    if type1 == type2:
        return y2 - y1 if type1 == "S" else y1 - y2
    return -1 if type1 == "S" else 1

Expected output:

[(0, 3, "S"), (0, 2, "S"), (1, 2, "E"), (2, 3, "E")],
[(3, 3, "S"), (4, 2, "S"), (5, 2, "E"), (5, 3, "E")],
[(6, 2, "S"), (7, 3, "S"), (7, 2, "E"), (8, 3, "E")]

I'm aware that a comparison function can be converted to a key function using functools.cmp_to_key; however, I'm wondering if there's a way to implement it directly as a key function since the docs say:

This function is primarily used as a transition tool for programs being converted from Python 2 which supported the use of comparison functions.

Does this answer your question? [How to use a custom comparison function in Python 3?](https://stackoverflow.com/questions/2531952/how-to-use-a-custom-comparison-function-in-python-3) — Lord Baryhobal, Aug 28 '21 at 22:25
To be clear, the problem is that you directly want to write logic for a `key` function that produces the same ordering as this `cmp` used to, not using a wrapper? So, what you need to do is come up with a rule that converts your values to a simpler, directly comparable value, in such a way that those converted values have the same comparison result as the result that you want for the original values. Can you do that? — Karl Knechtel, Aug 28 '21 at 22:25
"the rules are as follows" Please make sure that you think about all cases. In particular, what should happen if the `x` values are *not* equal? — Karl Knechtel, Aug 28 '21 at 22:25

score 4 · Accepted Answer · answered Aug 28 '21 at 22:34

4

Consider how tuples normally compare: element by element, going to the next element when the current values are equal (sometimes called lexicographic order).

Our required comparison algorithm, rewritten in steps that match that general approach, is:

First, we want to compare the x values, putting them in ascending order.
Then we want to compare the z values; we want tuples with an S to come first. This is the opposite of what normally happens, and we can't easily specify reverse order for only part of the key, and we can't negate a string value. However, since only S and E are possible, we can simply map S to 0 and E to 1. Or more simply, S can map to False and E to True, since those are numerically equal to 0 and 1 respectively.
Finally, if the z values were equal, we want to compare the y values - in normal order if we have a E, and in reverse order (so, negate the numeric values) if we have a S.

So, we create a key that corresponds to that rule:

The first element is x.
The second element is our translation of the original z value.
The third element is the y value, possibly negated depending on z.

In code:

def fancy_key(xyz: tuple[int, int, str]) -> tuple[int, bool, int]:
    x, y, z = xyz
    is_e = z == 'E'
    return (x, is_e, (y if is_e else -y))

answered Aug 28 '21 at 22:34

Karl Knechtel

62,466
11
102
153

As an aside, consider using an `Enum` to represent your S/E "type" value. – Karl Knechtel Aug 28 '21 at 22:38
[tested](https://tio.run/##ZVHNbgIhEL7zFBMvQkNNq7WaJh59gh6NMRt3iMRd2AAm4stvB7CWVLJk@eb7GcgMMZysWawHN47K2R5cY1r66X6wLoA/XZTqkGVKB3TB2s7/sgO6/hKaoK3xjOF1wGPwsIEdA1o7/iZhIWHyPRESEpg/wHsB2wzmRUZgL@/ORe38qJ3L2rl8dn7W4lUds6qd6z8n2zPWooIzRh7EV865SogSbvSWkDHSaTrbTmdKm5bfRC46DBdnkhbpe4klRjXmGA8p7BpvT3FUyxXtDymTShS8ndZ5nNSJpltG0KpIsfMIr1EIxiianI82jA1Om8CbruOepoItT3OR6T0b2iK1KLPJXcpS1t2LoM395P/xKSax9Zh5kQohxvEH) – no comment Aug 28 '21 at 22:41
I've tested this with larger dataset and it works. The explanation makes sense too. Thanks. One minor change I did is to convert the `bool` to an `int` for clarity. – Abhijit Sarkar Aug 28 '21 at 22:46
2

OP posted this same question yesterday (without the test cases) and I wrote out this exact function (line for line) right before it got closed. Happy to see I interpreted the vague description correctly in light of the example data. :D My solution to the S/E thing was to type it as `Union[Literal["S"], Literal["E"]]`. – Samwise Aug 28 '21 at 22:46
@Samwise I saw your comment but unfortunately, the question got quickly closed, even though the comparison function is trivial to come up with given the rules. – Abhijit Sarkar Aug 28 '21 at 22:48
@KarlKnechtel Is there a way to make the interpreter show the derivation? I mean, given a comparison function, show me the equivalent key function, given that they figured out a way to do it generically in `functools.cmp_to_key`. – Abhijit Sarkar Aug 28 '21 at 22:52
My answer to yesterday's question (also written up just before it was too hastily closed) was actually going to revolve around `cmp_to_key` and how it manages this, since you'd specifically mentioned it and the idea of a generic conversion from comparison functions to key "functions." I'll edit it a bit and then add it as another answer here, since it won't fit in a comment. Spoiler: `cmp_to_key()` 's generated "key function" is neither much of a function nor does it generate a key per se, at least in any way one typically thinks of one. (Really, we didn't generate one here either!) – George Aug 28 '21 at 23:10
1

@AbhijitSarkar You can see cmp_to_key's key function in the source code linked to at the top of the documentation's page. – no comment Aug 28 '21 at 23:20
@don'ttalkjustcode I assume you're trying to be helpful, and I appreciate that, but yet again you answer a different question than the one asked. My question was "_given a comparison function, show me the equivalent key function_". I understand I can study the generic `cmp_to_key` source code, but I'm not interested in that. I'm interested in the output of what `cmp_to_key` does. In other words, if it had a parameter `debug`, it'd dump the key fn to stdout. – Abhijit Sarkar Aug 29 '21 at 00:29
@AbhijitSarkar Uh... you say you're not interested in the `cmp_to_key` source code and the in the very next sentence you say you're interested in its output. Since that output is literally hardcoded in the source code, you just said you're not interested in what you're interested in. You make no sense. – no comment Aug 29 '21 at 00:35
@don'ttalkjustcode The output isn't hardcoded in the [source code](https://github.com/python/cpython/blob/3.9/Lib/functools.py#L206). It can't possibly be, because the output changes with the input. Anyway, I'll get my coat without wasting more time. – Abhijit Sarkar Aug 29 '21 at 00:43
@AbhijitSarkar It *is* right there, you just linked to it. Well, with the exception of *your* cmp function, but you already know that one, as you wrote it yourself! So what else do you want to see? – no comment Aug 29 '21 at 00:50
`cmp_to_key` doesn't work by writing a new function. It works by creating a new class that uses your function to implement the comparison operations (`__eq__` and so on), and returning *that class*. The idea is that classes are callable; calling the class upon the elements in your original sequence creates instances of the class, which then naturally compare in the desired way. So what you ask for is impossible; the source code already tells you everything there is to say. – Karl Knechtel Aug 29 '21 at 17:45

Karl Knechtel · Answer 2 · 2021-08-29T18:32:41.443

Alternately, one can replicate the work that the built-in cmp_to_key does, but hard-wiring the comparison logic from the original cmp function. I don't recommend this, obviously; but it is still in some sense "direct", and it highlights a few important things about Python internals.

The idea is, we create a wrapper class that implements the relational operators <, == etc. via the corresponding magic methods - __lt__, __eq__ etc. By using another tool from functools - the @total_ordering decorator - we only need to implement those two; the rest can be inferred by combining those results.

That could look like, for example:

import functools # since we're still using `total_ordering`

@functools.total_ordering
class fancy_key:
    def __init__(self, xyz):
        self.xyz = xyz
    def __lt__(self, other: fancy_key) -> bool:
        x1, y1, type1 = self.xyz
        x2, y2, type2 = other.xyz
        if x1 != x2:
            return x1 < x2
        if type1 != type2:
            return type1 == "S"
        return y2 < y1 if type1 == "S" else y1 < y2
    def __eq__(self, other: fancy_key) -> bool:
        return self.xyz == other.xyz

Notice that for __lt__ we have rewritten the comparison logic, modelled off the original _cmp - except that we return True for the cases where _cmp would return -1, and False otherwise. For __eq__, of course, we can simplify greatly; just compare the underlying data directly.

Because classes are callable, we can directly use this class as the key for a sort call. Python will use it to create instances of the wrapper for each original tuple (rather than calling a function to create transformed data for each original tuple). The instances compare in the way we want them to (because of the operator overload). Thus, we are done.

The actual cmp_to_key function (thanks to @George for the reference) generalizes this process by creating a new class on the fly, and returning that class. It uses a closure to provide the original mycmp (as it's named in that source code) to the new class, which has a hard-coded implementation of the magic methods - these all just call mycmp and check whether the result was negative, zero or positive. This implementation does not use total_ordering, presumably for efficiency; and also uses __slots__ for efficiency. It also tries to replace this code with a native C implementation, if available.

@George's original answer raised a lot of good points, so I tried to make it more concise and focused on the question being asked, and add useful references. While "how does `cmp_to_key` actually work?" was not the question asked, and questions of this sort aren't a great fit for Stack Overflow generally, it seemed possible and useful to write it up this way. — Karl Knechtel, Aug 29 '21 at 18:27
This is close to what I thought/hoped George was going to write when they [announced it](https://stackoverflow.com/questions/68968534/how-to-convert-a-python-2-comparison-function-to-a-python-3-key-function/68975795#comment121889687_68968592). Btw the first thing the [`list.sort` documentation](https://docs.python.org/3/library/stdtypes.html#list.sort) says is *"This method sorts the list in place, using only `<` comparisons"*, so supporting `<` *should* suffice here (not saying that's better). — no comment, Aug 29 '21 at 18:41
`cmp_to_key` implements all the relationals, not just lt -- but if one views a step-through of the code as I suggested in my original answer, one sees that lt is the only one ever used, so at least currently it would seem to be sufficient, yes. That applies to sorted() as well, incidentally. — George, Aug 30 '21 at 03:37
This answer is frankly actually *more* confusing than just presenting the original `cmp_to_key` code and explaining its function, and is less informative about the related "important things about Python internals" than that approach as well. An SO search for explanations of `cmp_to_key` yields answers most of which range from confusing to downright incorrect or unanswered, but one can be recommended: https://stackoverflow.com/questions/32752739/python-how-does-the-functools-cmp-to-key-function-works - read the unaccepted "answers" too for useful additional (thankfully not downvoted!) info. — George, Aug 30 '21 at 05:53

George · Answer 3 · 2021-08-29T05:49:27.997

Since the original version of this was rec'd for deletion as supposedly not actually answering the question due to it... being too long, I guess?, here's a shorter version of the exact same answer that gives less insight but uses fewer words. (Yes, it was too long. It also answered the question. "omg tl;dr" shouldn't be sufficient cause to claim that it didn't.)

The accepted answer uses the entire original tuple as the key, and simply reorders it so that the built-in sort can handle it correctly. It is, in my opinion, the best solution, as even though that's not really a "key" by any meaningful typical definition of a "key" (and it certainly doesn't give the efficiency benefits of a key-based sort) the efficiency of the sort itself is probably as close to the original compare-based sort as you can get it.

Additionally, there is another way to solve the problem of converting the comparison function to a key function, which is the way cmp_to_key solves it (as the OP alluded to). That is to use the key function to instead define a class with a single sort item (here, your tuple) as the sole member variable and define the class's __lt__() built-in (and ideally the other comparison built-ins as well) by having them call the comparison function with self as the first parameter.

To be exceedingly clear about this: that is a second solution to the problem. I don't know that it can be made any clearer that the above approach -- the approach that a Python standard library function itself uses -- does indeed solve the problem stated by the OP. Hopefully the review agrees, I suppose.

How and why does that work, and what are the pros and cons of going about it that way instead? Well, I'd explain, but it has unfortunately been deemed outside of the scope of the question to do so by the deletion review, which suggested asking a separate question to get more detail on how it works. As the OP seems more than capable and can likely figure it out themselves, I'll just leave that part out and make the reviewers happy. I've learned my lesson from how they treated the version of the question the OP posted yesterday.

(Ironically, lengthening the answer by including a code example that would basically be just a class definition and would only be useful to those that don't know how to define one already would seem to be unobjectionable, perhaps even preferable, to reviewers -- for those that need that, I'm sure there are already asked-and-answered questions elsewhere on SO regarding how to define and instantiate a class.)

Now that it is hopefully clear that an additional answer to the question was provided, I'll also note firstly that I still think the currently accepted answer is better for this specific case, and secondly that the cmp_to_key approach doesn't really create a traditional "key function" by any commonly accepted plain-face meaning of the term, and there likely really isn't any simple way to programmatically construct a "real" key function from a comparison function in the general case as is implied by the portion of the cmp_to_key docs quoted by the OP in the question itself (hopefully noting that it was specifically stated in the question is sufficient indication to reviewers that it is a relevant observation, despite it being seemingly insufficient the first time around).

This is not really an appropriate answer to the original question. While it almost gets around to addressing something in the question, it is a wall of *very* dense text, making it largely unelightening. I would recommend fashioning a question that speaks to this material directly, and then move this answer there. And doing some editing/formatting to make it more readable, including code. — Nathaniel Ford, Aug 29 '21 at 01:26
Removed the big PHP oops, and references to quicksort which weren't relevant anyway. To the other reviewers: the entry, while long, answers the question from a different perspective than the current (correct) accepted answer -- and comes to the same conclusion as that answer, as a matter of fact: that the best solution is to use the original data as the key value, just reordered to be able to be sorted natively. Moreover, it is a perspective that the author themselves requested in the accepted answer's comment thread, with regard to how such a key function relates to `cmp_to_key()`. — George, Aug 29 '21 at 04:19
Forget it. Rather than risk the deletion review I'll just rewrite the answer to state only the solution itself and take the actual (and probably more useful) discussion of the merits and drawbacks of the alternative out. Still don't know quite where "critiquing or requesting clarification from an author" comes in, though I find it ironic that the solution is to "leave a comment" when new SO contributors are barred from commenting until they hit a rep minimum and can *only* post solutions. Good thing I was neither critiquing nor requesting clarification, I guess? — George, Aug 29 '21 at 05:30
Talking about what you think the term "key function" should actually mean does not actually answer the question. For our purposes, "key function" means "a thing that can be passed to the `key` named keyword argument of the built-in Python `sort` function, such that the output is sorted in the desired order". — Karl Knechtel, Aug 29 '21 at 17:51
But yes, "discussion of merits and drawbacks" should be kept to a minimum here in answers, unless OP has explicitly requested a comparison by some objective metric (e.g., execution time or memory usage). Stack Overflow is *not a discussion forum*. I made an attempt to rewrite your answer more along the lines of what's expected here; notice that in roughly the same amount of space as your meta complaint, I manage to explain the idea in detail and include a fully worked code example. — Karl Knechtel, Aug 29 '21 at 18:29
The reason I am able to be so terse is because, rather than philosophizing about what it means to "convert" to a key function, I simply describe and emulate *what `cmp_to_key` actually does*. I also don't talk about the underlying theory of sorting (i.e., the DSU paradigm) because it *isn't necessary*; anyone who has read enough documentation to understand key-based sorting *at all* should *already understand that part*. — Karl Knechtel, Aug 29 '21 at 18:38
I don't "philosophize" about what it means to convert to a key function. A key-based sort, when presented as an alternative to comparison-based, implies a fundamentally different mechanism -- usually radix/bucketing, but certainly *something* where the usage of a key (and typically the nature of the keyspace) means a fundamentally different sort of algorithm is used. If this is somehow controversial, one needs just to look at the OP's request, "given a comparison function, show me the equivalent key function, given that they figured out a way to do it generically in functools.cmp_to_key." — George, Aug 30 '21 at 03:47
And DSU was mentioned *once* and not "talked about" any further in the original; it is an apt metaphor for what your solution does, excepting that using the decorated version as key means no need for the "undecorate" -- again, all explanations that would be useful to an OP that has explicitly expressed that they are looking for a "comparison function to key function" general solution which doesn't actually exist. @dont-talk-just-code 's argument w/the OP in the comments just proves my point -- he thinks a new function is being generated. It isn't. Explaining why is relevant to the question. — George, Aug 30 '21 at 03:58

How to convert a Python 2 comparison function to a Python 3 key function?

3 Answers3

Linked