Generalizing custom sorting function to multiple columns (ascending/descending order)

Question

This question is based on previous Stackoverflow answer. The problem seems too big for me.

I now have sorting function sort_multi(lst, index_normal, index_reversed) which takes input list, and two parameters which indicates which column is sorted in normal order and which in reverse:

from pprint import pprint
from itertools import groupby, chain

l = ((1, 'b'),
     (1, 'd'),
     (2, 'a'),
     (1, 'a'))

def sort_multi(lst, index_normal, index_reversed):
    return list(chain.from_iterable([sorted(list(j), key=lambda v:v[index_reversed], reverse=True) for i, j in groupby(sorted(lst), key=lambda v:v[index_normal])]))

pprint(sort_multi(l, 0, 1), width=10)  # index 0 is sorted normally, 1 in reversed order.

Output:

[(1, 'd'),
 (1, 'b'),
 (1, 'a'),
 (2, 'a')]

What I want is extend the function to work for any number of indices. Eg.

For example input of 3 columns:

l = ((1, 'c', 'aa'),
     (2, 'b', 'bb'),
     (1, 'c', 'cc'),
     (1, 'd', 'dd'))

sort_multi(l, reversed=(False, True, True))

Gives output:

[(1, 'd', 'dd'),
 (1, 'c', 'cc'),
 (1, 'c', 'aa'),
 (2, 'b', 'bb')]

The tuples can contains strings, integers, ...

But doesn't the [answer on the previous question](https://stackoverflow.com/a/51252873/953482) already do this? Can't you pass however many values you want to the `ascending` argument? — Kevin, Jul 10 '18 at 12:38
I don' t want to use pandas, just builtin modules. I'm not sure if it's possible to do it efficiently (maybe with recursion?) — Andrej Kesely, Jul 10 '18 at 12:44

Kevin · Accepted Answer · 2018-07-10T13:20:37.760

Recursive approach: sort the list by the first criteria. Then group by that criteria, and recursively sort each sub-group using the remaining criteria.

from operator import itemgetter
import itertools

def multisorted(seq, indices, reversed_states):
    if len(indices) == 0:
        return seq
    index = indices[0]
    reversed = reversed_states[0]
    partially_sorted_seq = sorted(seq, key = itemgetter(indices[0]), reverse=reversed_states[0])
    result = []
    for key, group in itertools.groupby(partially_sorted_seq, key=itemgetter(indices[0])):
        result.extend(multisorted(group, indices[1:], reversed_states[1:]))
    return result

d = ((1, 'c', 'aa'),
     (2, 'b', 'bb'),
     (1, 'c', 'cc'),
     (1, 'd', 'dd'))

print(multisorted(d, (0,1,2), (False, True, True)))

Result:

[(1, 'd', 'dd'), (1, 'c', 'cc'), (1, 'c', 'aa'), (2, 'b', 'bb')]

Approach #2. sort and sorted can already sort by multiple criteria out-of-the-box. We only need to change the key so that some of the elements of each tuple have reversed logic in regards to inequality comparison.

from functools import total_ordering

@total_ordering
class Negated:
    def __init__(self, value):
        self.value = value
    def __eq__(self, other):
        return self.value == other.value
    def __lt__(self, other):
        return self.value > other.value

def multisorted(seq, ordering):
    return sorted(seq, key = lambda t: [Negated(x) if reversed else x for x, reversed in zip(t, ordering)])

d = ((1, 'c', 'aa'),
     (2, 'b', 'bb'),
     (1, 'c', 'cc'),
     (1, 'd', 'dd'))

print(multisorted(d, (False, True, True)))

Thanks, the class method is interesting. TIL – Andrej Kesely Jul 10 '18 at 14:12 — Andrej Kesely, Jul 10 '18 at 14:12

Dusan Gligoric · Answer 2 · 2018-07-10T14:00:57.323

2

You can do something way simpler like:

from operator import itemgetter

def sort_any_multi(lst, index_order_reversed):
    for index, reverse in reversed(index_order_reversed):
       lst = sorted(lst, key=itemgetter(index), reverse=reverse)
    return lst

r = ((1, 'c', 'aa'),
     (2, 'b', 'bb'),
     (1, 'c', 'cc'),
     (1, 'd', 'dd'))

print(sort_any_multi(r, [[0, False], [1, True], [2, True]]))

Gives off:

[(1, 'd', 'dd'), (1, 'c', 'cc'), (1, 'c', 'aa'), (2, 'b', 'bb')]

Trick is in sorting from the last priority index to the highest priority one.

edited Jul 10 '18 at 14:00

answered Jul 10 '18 at 13:36

Dusan Gligoric

582
3
21

Yep, this is a perfectly serviceable approach. I suspect it's just a little inefficient since each sort goes over the entire list, where a recursive approach might call more sorts, but on smaller subsets. It's hard to formally prove that one has a better complexity than the other, though. – Kevin Jul 10 '18 at 14:15
1

@Kevin Ran 10 tests for 1 million 3 dimension integer array, code I pasted was 25% faster to my amazement. Your first example averaged around 4s, 2nd around 9s, mine did around 3s. But yes, estimations, estimations, I guess if not heavy production one should rely on what he can understand best. – Dusan Gligoric Jul 10 '18 at 14:32

Generalizing custom sorting function to multiple columns (ascending/descending order)

2 Answers2