Low Complexity Python Sort (Small - Large - Small) by a Class Attribute

Question

I am looking to sort my list(class) into the order of small - large - small, for example if it were purely numeric and the list was [1,5,3,7,7,3,2] the sort would look like [1,3,7,7,5,3,2].

The basic class structure is:

class LaneData:
    def __init__(self):
        self.Name = "Random"
        self.laneWeight = 5

So essentially the sort function would work from the LaneData.laneWeight variable. I found this answer here which I'm not entirely sure if it will work in this instance using a class as the list variable.

My second idea is like so (pseudo code below) [inspired by]:

def sortByWeight(e):
    return e.getLaneWeight()

newList = [] # create a new list to store
lanes.sort(key=sortByWeight) # python default sort by our class attrib
newList = lanes[:len(lanes) // 2] # get the first half of the sorted list (low-high)
lanes.sort(key=sortByWeight, reverse=True) # reverse the sort
newList = newList + lanes[:len(lanes) // 2] # get the first half of the sorted list (high-low)

If possible I'd like to keep the sort small and efficient, I don't mind building a small algorithm for it if needs be. What's your thoughts team?

You don't actually need to use `sort` the second time (though the built-in sort is efficient for already-sorted lists). `newList = lanes[:n] + list(reversed(lanes[n:]))` would suffice. — chepner, Feb 10 '21 at 20:21
There are a couple of issues with this question: 1. it is unclear what criterion should be used for your algorithm; 2. the pseudo-code you provide should really be some [MWE](/help/minimal-reproducible-example); 3. the alogorithm you propose does not work on your example; 3. you should abstract your algorithmic problem properly (i.e. separate the class attribute access from the rest of the problem statement, as this is largely irrelevant). — norok2, Feb 10 '21 at 21:43
@chepner I can see what you're saying, but how can this work on the class attribute? — Hex, Feb 10 '21 at 22:34
@norok2 Thanks for the feedback. 1. The criterion is simple, reorder the list so that the largest numbers are in the middle of list, and the further out lower/higher (index) you go the numbers get smaller. The exact balance either side isn't an issue, more so the basic idea of small - big - small. 3. I believe I stated that, is there something I missed? I felt it worth mentioning as its an example demonstrating what I am attempting in a more basic form. — Hex, Feb 10 '21 at 22:34

score 2 · Answer 1 · answered Feb 10 '21 at 20:24

2

Your solution works but you sort the end of the list in ascending order first and in descending order afterwards.

You could optimize it by :

Looking for the index of the max, swap the max with the element in the middle position and finally sort separately the first half of the table (in ascending order) and the second half (in descending order).
Instead of doing a reverse sort of the second half, you can simply reverse it (with the reverse function). This is less complex (O(n) instead of O(nlog(n))

answered Feb 10 '21 at 20:24

Fabien Laurent

21
1

I'm pretty sure the built-in sort is already O(n) for sorted inputs, as part of the algorithm involves looking for monotonic subsequences in the input. – chepner Feb 10 '21 at 20:30
Yeah I believe your first optimisation should work, at current you are correct that it works, except it has the potential to skip one list item due to the division by two. – Hex Feb 10 '21 at 22:52

norok2 · Answer 2 · 2021-02-11T19:00:27.253

Discussion

Given that you can just use the key parameter, I would just ignore it for the time being.

Your algorithm for a given sequence looks like:

def middle_sort_flip_OP(seq, key=None):
    result = []
    length = len(seq)
    seq.sort(key=key)
    result = seq[:length // 2]
    seq.sort(key=key, reverse=True)
    result.extend(seq[:length // 2])
    return result
    

print(middle_sort_flip_OP([1, 5, 3, 7, 3, 2, 9, 8]))
# [1, 2, 3, 3, 9, 8, 7, 5]
print(middle_sort_flip_OP([1, 5, 3, 7, 3, 2, 8]))
# [1, 2, 3, 8, 7, 5]

The second sorting step is completely unnecessary (but has the same computational complexity as a simple reversing for the Timsort algorithm implemented in Python), since you can simply slice the sorted sequence backward (making sure to compute the correct offset for the "middle" element):

def middle_sort_flip(seq, key=None):
    length = len(seq)
    offset = length % 2
    seq.sort(key=key)
    return seq[:length // 2] + seq[:length // 2 - 1 + offset:-1]
    

print(middle_sort_flip([1, 5, 3, 7, 3, 2, 9, 8]))
# [1, 2, 3, 3, 9, 8, 7, 5]
print(middle_sort_flip([1, 5, 3, 7, 3, 2, 8]))
# [1, 2, 3, 8, 7, 5]

Another approach which is theoretically more efficient consist in ordering the left and right side of the sequence separately. This is more efficient because each sorting step is O(N/2 log N/2) and, when combined gives O(N log N/2) (instead of O(N + N log N)):

def middle_sort_half(seq, key=None):
    length = len(seq)
    return \
        sorted(seq[:length // 2], key=key) \
        + sorted(seq[length // 2:], key=key, reverse=True)

However, those approaches either give a largely unbalanced result where the whole right side is larger than the left side (middle_sort_flip()), or have a balancing which is dependent on the initial ordering of the input (middle_sort_half()).

A more balanced result can be obtained by extracting and recombining the odd and even subsequences. This is simple enough in Python thanks to the slicing operations and has the same asymptotic complexity as middle_sort_flip() but with much better balancing properties:

def middle_sort_mix(seq, key=None):
    length = len(seq)
    offset = length % 2
    seq.sort(key=key)
    result = [None] * length
    result[:length // 2] = seq[::2]
    result[length // 2 + offset:] = seq[-1 - offset::-2]
    return result


print(middle_sort_mix([1, 5, 3, 7, 3, 2, 9, 8]))
# [1, 3, 5, 8, 9, 7, 3, 2]
print(middle_sort_mix([1, 5, 3, 7, 3, 2, 8]))
# [1, 3, 5, 8, 7, 3, 2]

Benchmarks

Speedwise they are all very similar when the key parameter is not used, because the execution time is dominated by the copying around:

import random


nums = [10 ** i for i in range(1, 7)]
funcs = middle_sort_flip_OP, middle_sort_flip, middle_sort_half, middle_sort_mix
print(nums)
# [10, 100, 1000, 10000, 100000, 1000000]


def gen_input(num):
    return list(range(num))


for num in nums:
    print(f"N = {num}")
    for func in funcs:
        seq = gen_input(num)
        random.shuffle(seq)
        print(f"{func.__name__:>24s}", end="  ")
        %timeit func(seq.copy())
    print()

...
N = 1000000
     middle_sort_flip_OP  542 ms ± 54.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
        middle_sort_flip  510 ms ± 49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
        middle_sort_half  546 ms ± 4.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
         middle_sort_mix  539 ms ± 63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

On the other hand, when the key parameter is non-trivial, your approach has a much larger amount of function calls as the other two, which may result in a significant increase in execution time for middle_sort_OP():

def gen_input(num):
    return list(range(num))


def key(x):
    return x ** 2


for num in nums:
    print(f"N = {num}")
    for func in funcs:
        seq = gen_input(num)
        random.shuffle(seq)
        print(f"{func.__name__:>24s}", end="  ")
        %timeit func(seq.copy(), key=key)
    print()

...
N = 1000000
     middle_sort_flip_OP  1.33 s ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
        middle_sort_flip  1.09 s ± 23.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
        middle_sort_half  1.1 s ± 27.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
         middle_sort_mix  1.11 s ± 8.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

or, closer to your use-case:

class Container():
    def __init__(self, x):
        self.x = x
        
    def get_x(self):
        return self.x

    
def gen_input(num):
    return [Container(x) for x in range(num)]


def key(c):
    return c.get_x()


for num in nums:
    print(f"N = {num}")
    for func in funcs:
        seq = gen_input(num)
        random.shuffle(seq)
        print(f"{func.__name__:>24s}", end="  ")
        %timeit func(seq.copy(), key=key)
    print()

...
N = 1000000
     middle_sort_flip_OP  1.27 s ± 4.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
        middle_sort_flip  1.13 s ± 13.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
        middle_sort_half  1.24 s ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
         middle_sort_mix  1.16 s ± 8.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

which seems to be a bit less dramatic.

Low Complexity Python Sort (Small - Large - Small) by a Class Attribute

2 Answers2

Discussion

Benchmarks