To sort by a key in Python 2.3 or lower, you could use the cmp
parameter. But sometimes key
style sorting is easier to read; and in any case, it does less work, since cmp
will be called O(n log n) times, while the key
function will be called only O(n) times.
With that in mind, here's a way to reproduce the behavior of the key
parameter in later versions of Python. It uses the decorate-sort-undecorate idiom, a.k.a. the Schwartzian Transform. This won't be quite as space efficient because it makes copies, but for large lists, it is likely to be more time efficient. I've named this sorted
because it roughly reproduces the sorted
function added in 2.4; check the python version and conditionally import this so that you don't smash the built-in sorted
in newer versions -- or just rename it.
def sorted(seq, key=lambda x: None, reverse=False):
seq = [(key(x), i, x) for i, x in enumerate(seq)]
seq.sort()
if reverse:
seq.reverse()
return [x for k, i, x in seq]
Note that enumerate
is only necessary if you care about having a stable sort on unequal values with equal keys; it slows down the function by a hair. Tested on your data:
>>> key=lambda x: (x.count('YES'), x.count('MAYBE'), x.count('NO'))
>>> my_sorted(mylist, key=key, reverse=True)
[['ITEM C', 'YES', 'YES', 'YES', 'YES', 'NO', 'NO', 'MAYBE', 'NO', 'MAYBE'],
['ITEM B', 'YES', 'NO', 'YES', 'YES', 'NO', 'NO', 'NO', 'NO', 'MAYBE'],
['ITEM A', 'YES', 'NO', 'YES', 'YES', 'NO', 'NO', 'NO', 'NO', 'NO']]
You might also consider using a dictionary to do your counting; that way, only one pass is required. However, count
is sufficiently optimized that three passes are still faster than one Python for
loop, at least on my machine. So only use this if you need to count lots of values. I'll leave this here for posterity:
def my_key(inner_list):
counts = {'YES':0, 'MAYBE':0, 'NO':0}
for i in inner_list:
if i in counts:
counts[i] += 1
return (counts['YES'], counts['MAYBE'], counts['NO'])
I did some testing; apologies for the long post. The below is only for the curious and inquisitive.
My tests indicate that on the smaller list, decorate, sort, undecorate is already faster than using the built-in sort + cmp
. On a bigger list, the difference becomes more dramatic. Definitions:
def key_count(x):
return (x.count('YES'), x.count('MAYBE'), x.count('NO'))
def key_dict(inner_list):
counts = {'YES':0, 'MAYBE':0, 'NO':0}
for i in inner_list:
if i in counts:
counts[i] += 1
return (counts['YES'], counts['MAYBE'], counts['NO'])
def decorate_sort(seq, key=lambda x: None, reverse=False):
seq = [(key(x), i, x) for i, x in enumerate(seq)]
seq.sort()
if reverse:
seq.reverse()
return [x for k, i, x in seq]
def builtin_sort(seq, key, reverse=False):
seq.sort(lambda p, q: cmp(key(p), key(q)))
if reverse:
seq.reverse()
Tests:
>>> mylist = [
... ['ITEM A', 'YES', 'NO', 'YES', 'YES', 'NO', 'NO', 'NO', 'NO', 'NO'],
... ['ITEM B', 'YES', 'NO', 'YES', 'YES', 'NO', 'NO', 'NO', 'NO', 'MAYBE'],
... ['ITEM C', 'YES', 'YES', 'YES', 'YES', 'NO', 'NO', 'MAYBE', 'NO', 'MAYBE']
... ]
>>> %timeit decorate_sort(mylist, key=key_count, reverse=True)
100000 loops, best of 3: 5.03 us per loop
>>> %timeit builtin_sort(mylist, key=key_count, reverse=True)
100000 loops, best of 3: 5.28 us per loop
The built-in version is already slower! The less-generalized version mylist.sort(lambda p, q: -cmp(key(p), key(q)))
is a better by a hair for a short list, because of the addition of enumerate
to decorate_sort
; without it, decorate_sort
is faster (4.28 us per loop in my previous test):
>>> %timeit mylist.sort(lambda p, q: -cmp(key_count(p), key_count(q)))
100000 loops, best of 3: 4.74 us per loop
Using key_dict
is a mistake in this case, though:
>>> %timeit decorate_sort(mylist, key=key_dict, reverse=True)
100000 loops, best of 3: 8.97 us per loop
>>> %timeit builtin_sort(mylist, key=key_dict, reverse=True)
100000 loops, best of 3: 11.4 us per loop
Testing it on a larger list, basically the same results hold:
>>> import random
>>> mylist = [[random.choice(('YES', 'MAYBE', 'NO')) for _ in range(1000)]
for _ in range(100)]
>>> %timeit decorate_sort(mylist, key=key_count, reverse=True)
100 loops, best of 3: 6.93 ms per loop
>>> %timeit builtin_sort(mylist, key=key_count, reverse=True)
10 loops, best of 3: 34.5 ms per loop
The less generalized version is now slower than decorate_sort
.
>>> %timeit mylist.sort(lambda p, q: -cmp(key_count(p), key_count(q)))
100 loops, best of 3: 13.5 ms per loop
And key_dict
is still slower. (But faster than builtin_sort
!)
>>> %timeit decorate_sort(mylist, key=key_dict, reverse=True)
10 loops, best of 3: 20.4 ms per loop
>>> %timeit builtin_sort(mylist, key=key_dict, reverse=True)
10 loops, best of 3: 103 ms per loop
So the upshot is that the Schwartzian Transform provides a solution that is both faster and more generalized -- a rare and wonderful combination.