Custom sort order in Python

Question

Am I able to define my own sort order and still use the Python sort function? For example, say my desired sorting order is:

list1 = ['Stella', 'Bob', 'Joe', 'Betty']

I'd hoped to do something like:

list2 = ['Joe', 'Stella']
sorted(list2, key=list1)

and get:

['Stella', 'Joe']

where the sorted order stays within the custom order in list1. I can do it in multiple steps by swapping out for numeric values but thought there may be a way to do it similar to above.

Selcuk · Answer 1 · 2020-07-28T00:00:11.843

5

Note that for the example you have given you are wasting time sorting list2 as you can simply use the already sorted list1 if the items in list2 are unique:

print([e for e in list1 if e in list2])

should print

['Stella', 'Joe']

To improve it even more, you can use a set for faster lookups:

set2 = set(list2)
print([e for e in list1 if e in set2])

This second variant works in O(n) which is faster than any sorting algorithm.

Edit: The answers given by Nick and Jab are close to O(n*m + n*log n). This is a O(n * log n) solution that works for non-unique cases:

lookup_dict = {k: i for i, k in enumerate(list1)}
print(sorted(list2, key=lookup_dict.get))

edited Jul 28 '20 at 00:00

answered Jul 27 '20 at 01:50

Selcuk

57,004
12
102
110

Subject to the unique caveat this is a great solution. – Nick Jul 27 '20 at 01:52
@Selcuk, In my particular case, they won't always be unique, which is why I marked answer below. Nice method though in unique case. – sobrio35 Jul 27 '20 at 02:10
@sobrio35 Fair enough. I have added an alternative solution which should work in all cases for future reference. – Selcuk Jul 27 '20 at 02:47
1

I've shown an O(n) solution that works for non-unique cases – Mad Physicist Jul 27 '20 at 15:47

score 2 · Accepted Answer · edited Jul 27 '20 at 03:10

2

There's no need to use a lambda in the sorted function. Just supply it the index function of list1:

sorted(list2, key=list1.index)

edited Jul 27 '20 at 03:10

Mad Physicist

107,652
25
181
264

answered Jul 27 '20 at 01:56

Jab

26,853
21
75
114

This is not very efficient, but quick to write and good enough for most use cases. – Mad Physicist Jul 27 '20 at 03:11

Mad Physicist · Answer 3 · 2020-07-27T18:06:00.390

If list1 and list2 are both guaranteed to contain only unique elements, you can turn list2 into a set and select from list1 based on it:

set2 = set(list2)
list2_sorted = [x for x in list1 if x in set2]

If, on the other hand list2 contains non-unique elements, you can use collections.Counter to do the same thing while maintaining optimal algorithmic complexity.

counter2 = Counter(list2)
list2_sorted = []
for x in list1:
    list2_sorted.extend([x] * counter2[x])

The algorithmic complexity is the same here because python's list growing is amortized to provide O(n) behavior in the long term, and the keys of a Counter, or any other dict, are implemented as a hash-table with O(1) lookup, just like a set.

You can use a nested comprehension instead of the loop:

list2_sorted = [x for x in list1 for _ in range(counter2[x])]

You can use the set-like behavior of dictionaries to maintain O(n) behavior even if list1 contains duplicates, as long as the sort order is determined only by the first occurrence. In Python 3.6+ dictionaries are ordered, so you could use plain dict. Prior versions would require you to use collections.OrderedDict to obtain the same behavior:

list1_keys = OrderedDict.fromkeys(list1)
list2_sorted = [x for x in list1_keys for _ in range(counter2[x])]

The reason to use a dict when you are looking for set-like behavior is that set is not ordered.

@Nick. O(n)-ish depending on how you define n. It's a hash-table lookup for each element. — Mad Physicist, Jul 27 '20 at 03:07
Yeah but you have to generate the hash table too... although I suppose that is also O(n) — Nick, Jul 27 '20 at 03:08
@Nick. It is. The n has a large load factor, but it's about how it responds to larger inputs, not how practical it actually is. — Mad Physicist, Jul 27 '20 at 03:09
@Nick. I've added an example for O(n) if both lists contain duplicates — Mad Physicist, Jul 27 '20 at 15:53

score 1 · Answer 4 · answered Jul 27 '20 at 01:46

1

You can use a lambda function in sorted, taking list1.index() of the value to get a sort order:

list1 = ['Stella', 'Bob', 'Joe', 'Betty']

list2 = ['Joe', 'Stella']
sorted(list2, key=lambda v:list1.index(v))

Output:

['Stella', 'Joe']

answered Jul 27 '20 at 01:46

Nick

138,499
22
57
95

This is O(n^2). – Mad Physicist Jul 27 '20 at 02:08
@MadPhysicist as is the accepted answer. If you have a better solution for a `list2` which may not be unique, please post it... – Nick Jul 27 '20 at 02:10
I've posted an answer. Thanks for pointing out this important corner case. – Mad Physicist Jul 27 '20 at 03:00

Custom sort order in Python

4 Answers4