36

The move in recent versions of Python to passing a key function to sort() from the previous cmp function is making it trickier for me to perform complex sorts on certain objects.

For example, I want to sort a set of objects from newest to oldest, with a set of string tie-breaker fields. So I want the dates in reverse order but the strings in their natural order. With a comparison function I can just reverse the comparison for the date field compared to the string fields. But with a key function I need to find some way to invert/reverse either the dates or the strings.

It's easy (although ugly) to do with numbers - just subtract them from something - but do I have to find a similar hack for dates (subtract them from another date and compare the timedeltas?) and strings (...I have no idea how I'd reverse their order in a locale-independent way).

I know of the existence of functools.cmp_to_key() but it is described as being "primarily used as a transition tool for programs being converted to Python 3 where comparison functions are no longer supported". This implies that I should be able to do what I want with the key method - but how?

martineau
  • 119,623
  • 25
  • 170
  • 301
Kylotan
  • 18,290
  • 7
  • 46
  • 74
  • 1
    Passing reverse=True just changes which of the sub-keys I need to find a way to invert. Imagine (for some strange reason) I wanted to sort by first name ascending and last name descending, for example. Passing reverse=True just means I have to find a way to invert the first name now, instead of the last name. – Kylotan Jun 26 '12 at 12:14
  • http://python3porting.com/preparing.html#keycmp-section – wim Jun 26 '12 at 13:32
  • It's not clear how that link helps. – Kylotan Jun 26 '12 at 14:20
  • 1
    possible duplicate of [Python: List Sorting with Multiple Attributes and Mixed Order](http://stackoverflow.com/questions/1516249/python-list-sorting-with-multiple-attributes-and-mixed-order) – Jesse May 10 '13 at 18:55
  • 3
    A better title would be **How to sort non-numeric tuples with different orders?** – Martin Thoma Aug 23 '17 at 11:57
  • May I add an example? I've just prepared a question when I found yours. – Martin Thoma Aug 23 '17 at 11:57
  • Does this answer your question? [sort Python list with two keys but only one in reverse order](https://stackoverflow.com/questions/37693373/sort-python-list-with-two-keys-but-only-one-in-reverse-order) – user202729 Aug 14 '21 at 13:32

7 Answers7

26

The most generic way to do this is simply to sort separately by each key in turn. Python's sorting is always stable so it is safe to do this:

sort(data, key=tiebreakerkey)
sort(data, key=datekey, reverse=True)

will (assuming the relevant definitions for the key functions) give you the data sorted by descending date and ascending tiebreakers.

Note that doing it this way is slower than producing a single composite key function because you will end up doing two complete sorts, so if you can produce a composite key that will be better, but splitting it out into separate sorts gives a lot of flexibility: given a key function for each column you can make any combination of them and specify reverse for any individual column.

For a completely generic option:

keys = [ (datekey, True), (tiebreakerkey, False) ]
for key, rev in reversed(keys):
    sort(data, key=key, reverse=rev)

and for completeness, though I really think it should be avoided where possible:

from functools import cmp_to_key
sort(data, key=cmp_to_key(your_old_comparison_function))

The reason I think you should avoid this you go back to having n log n calls to the comparison function compared with n calls to the key function (or 2n calls when you do the sorts twice).

Duncan
  • 92,073
  • 11
  • 122
  • 156
  • +1 for the same reason as katrielalex: it does address the problem, although I don't think it counts as 'a' key function, and I think it is less elegant than a comparison function. – Kylotan Jun 26 '12 at 12:28
  • Even though it is slower than a composite key function it will probably still be a lot faster than a comparison function. It is also possible to write a cmp_to_key wrapper that will wrap any comparison function up to be used as a key function instead, but that's really messy – Duncan Jun 26 '12 at 12:30
  • There's a cmp_to_key in the functools module so you could argue that's cleaner than having to generalise to multiple sorting passes. But mostly I'm just surprised that the `cmp` option has been removed when it's clearly the cleaner way to perform some sorts. – Kylotan Jun 26 '12 at 12:36
  • `cmp` is much slower than using `key` especially when you have custom comparison. With a `key` function the custom Python code is called `n` times and then (usually) much faster builtin code used to compare a simple object or tuple of objects for `n log n` comparisons. With `cmp` you get the full `n log n` calls to the Python code. – Duncan Jun 26 '12 at 12:39
  • 1
    I appreciate the speed differences, but it's rare for Python to implement optimisations that make the code significantly uglier. Why not just leave in the 2 code paths and document that `key` is faster? – Kylotan Jun 26 '12 at 12:47
  • Python 3 dropped `cmp` and `__cmp__` altogether, in particular `sort` now only requires `__lt__` to be defined for the keys. This has advantages if you want to sort objects which cannot implement a full comparison as well as reducing the scope for errors when `__lt__` and `__cmp__` produce inconsistent results. There's an explanation here: http://python3porting.com/problems.html#comparisons – Duncan Jun 26 '12 at 12:55
  • 2
    http://wiki.python.org/moin/HowTo/Sorting/#Sort_Stability_and_Complex_Sorts specifically recommends this method for complex sorting, and notes that *the Timsort algorithm used in Python does multiple sorts efficiently because it can take advantage of any ordering already present in a dataset*. – ecatmur Jun 26 '12 at 12:59
18

The slow-but-elegant way to do this is to create a value wrapper that has reversed ordering:

from functools import total_ordering
@total_ordering
class ReversedOrder:
    def __init__(self, value):
        self.value = value
    def __eq__(self, other):
        return other.value == self.value
    def __lt__(self, other):
        return other.value < self.value

If you don't have functools.total_ordering, you'd have to implement all 6 comparisons, e.g.:

import operator
class ReversedOrder:
    def __init__(self, value):
        self.value = value
for x in ['__lt__', '__le__', '__eq__', '__ne__', '__ge__', '__gt__']:
    op = getattr(operator, x)
    setattr(ReversedOrder, x, lambda self, other, op=op: op(other.value, self.value))
ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • 1
    If the only use for `ReversedOrder` is for use in a sort key then you only need to implement `__lt__` and can safely ignore all of the others. – Duncan Jun 26 '12 at 12:50
  • @Duncan where is that documented? I can't find it in the reference. – ecatmur Jun 26 '12 at 12:55
  • 1
    See the Python 3 release notes http://docs.python.org/release/3.0.1/whatsnew/3.0.html#ordering-comparisons . I don't know where if anywhere it is in the main docs though. – Duncan Jun 26 '12 at 12:57
  • Going to accept this answer, because the other posts certainly solve the problem, but this one most closely answers the question. – Kylotan Jun 27 '12 at 11:28
  • 1
    @Duncan: There is an explicit guarantee in [`list.sort`'s docs](https://docs.python.org/3/library/stdtypes.html#list.sort): "This method sorts the list in place, using only `<` comparisons between items." [The Sorting HOW TO](https://docs.python.org/3/howto/sorting.html#odd-and-ends) explicitly adds that "The sort routines are guaranteed to use `__lt__()` when making comparisons between two objects. So, it is easy to add a standard sort order to a class by defining an `__lt__()` method:" ("sort routines" referring to both `list.sort` and `sorted` in this context). – ShadowRanger Oct 19 '20 at 16:40
12

I think the docs are incomplete. I interpret the word "primarily" to mean that there are still reasons to use cmp_to_key, and this is one of them. cmp was removed because it was an "attractive nuisance:" people would gravitate to it, even though key was a better choice.

But your case is clearly better as a cmp function, so use cmp_to_key to implement it.

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
  • +1 I agree with this answer, nothing wrong with using the solution already provided by functools in those (rarer) cases where using a cmp function is the more elegant way. – wim Jun 26 '12 at 13:34
  • I wonder why this function wasn't just overloaded for the two use cases, even with the equivalent mapping structure being possible. – jxramos Jul 14 '17 at 22:46
6

Sort twice, once on each key and once reversed.

(Python sort is stable; that is, it doesn't change the order of the original list unless it has to.)

It does matter which order you do the sorts in, if you care about how equal elements get sorted.

Katriel
  • 120,462
  • 19
  • 136
  • 170
  • Nice idea. Still, I'd consider that a bit hacky compared to a custom comparison function, and it doesn't work well in my code base (where I pick a sort key based on input and then call `sort()` once with it), but it is certainly a way around the problem. – Kylotan Jun 26 '12 at 12:27
2

One way is to use pandas library and args ascending, setting the columns you want to sort ascending and the columns you want descending by doing e.g. ascending=[True,False,False]

You can do that not only for two levels (e.g. datetime and str) but to any number of levels needed.

For example, if you have

d = [[1, 2, datetime(2017,1,2)], 
     [2, 2, datetime(2017,1,4)],
     [2, 3, datetime(2017,1,3)],
     [2, 3, datetime(2017,1,4)], 
     [2, 3, datetime(2017,1,5)], 
     [2, 4, datetime(2017,1,1)], 
     [3, 1, datetime(2017,1,2)]]

You can setup your df

df = pd.DataFrame(d)

and use sort_values

sorted_df = df.sort_values(by=[0,1,2], ascending=[True,False,False])
sorted_list = sorted_df.agg(list, 1).tolist()


[[1, 2, Timestamp('2017-01-02 00:00:00')],
 [2, 4, Timestamp('2017-01-01 00:00:00')],
 [2, 3, Timestamp('2017-01-05 00:00:00')],
 [2, 3, Timestamp('2017-01-04 00:00:00')],
 [2, 3, Timestamp('2017-01-03 00:00:00')],
 [2, 2, Timestamp('2017-01-04 00:00:00')],
 [3, 1, Timestamp('2017-01-02 00:00:00')]]

Notice that the first column is sorted ascending, and the second and third are descending, which is of course due to setting ascending=[True,False,False].

rafaelc
  • 57,686
  • 15
  • 58
  • 82
0

For String, you can use some commonly acknowledged maximum value (such as 2^16 or 2^32) and use chr(), unicode(), ord() to do the math, just like for integers.

In one of my work, I know I deal with strings in utf8 and their ordinals are below 0xffff, so I wrote:

def string_inverse(s):
    inversed_string = ''
    max_char_val = 0xffff
    for c in s:
        inversed_string += unicode(max_char_val-ord(c))
    return inversed_string        

result.sort(key=lambda x:(x[1], string_inverse(x[0])), reverse=True)

x is of type: (string, int), so what I get is, to abuse the SQL:

select * from result order by x[1] desc, x[0] asc;
Kun Wu
  • 329
  • 2
  • 8
0

try this:

>>> import functools
>>> reverse_key = functools.cmp_to_key(lambda a, b: (a < b) - (a > b))
>>> reverse_key(3) < reverse_key(4)
False
>>> reverse_key(3) > reverse_key(4)
True
>>> reverse_key('a') < reverse_key('b')
False
Yankai Zhang
  • 161
  • 3
  • 9