Discussion
Given that you can just use the key
parameter, I would just ignore it for the time being.
Your algorithm for a given sequence looks like:
def middle_sort_flip_OP(seq, key=None):
result = []
length = len(seq)
seq.sort(key=key)
result = seq[:length // 2]
seq.sort(key=key, reverse=True)
result.extend(seq[:length // 2])
return result
print(middle_sort_flip_OP([1, 5, 3, 7, 3, 2, 9, 8]))
# [1, 2, 3, 3, 9, 8, 7, 5]
print(middle_sort_flip_OP([1, 5, 3, 7, 3, 2, 8]))
# [1, 2, 3, 8, 7, 5]
The second sorting step is completely unnecessary (but has the same computational complexity as a simple reversing for the Timsort algorithm implemented in Python), since you can simply slice the sorted sequence backward (making sure to compute the correct offset for the "middle" element):
def middle_sort_flip(seq, key=None):
length = len(seq)
offset = length % 2
seq.sort(key=key)
return seq[:length // 2] + seq[:length // 2 - 1 + offset:-1]
print(middle_sort_flip([1, 5, 3, 7, 3, 2, 9, 8]))
# [1, 2, 3, 3, 9, 8, 7, 5]
print(middle_sort_flip([1, 5, 3, 7, 3, 2, 8]))
# [1, 2, 3, 8, 7, 5]
Another approach which is theoretically more efficient consist in ordering the left and right side of the sequence separately. This is more efficient because each sorting step is O(N/2 log N/2)
and, when combined gives O(N log N/2)
(instead of O(N + N log N)
):
def middle_sort_half(seq, key=None):
length = len(seq)
return \
sorted(seq[:length // 2], key=key) \
+ sorted(seq[length // 2:], key=key, reverse=True)
However, those approaches either give a largely unbalanced result where the whole right side is larger than the left side (middle_sort_flip()
), or have a balancing which is dependent on the initial ordering of the input (middle_sort_half()
).
A more balanced result can be obtained by extracting and recombining the odd and even subsequences. This is simple enough in Python thanks to the slicing operations and has the same asymptotic complexity as middle_sort_flip()
but with much better balancing properties:
def middle_sort_mix(seq, key=None):
length = len(seq)
offset = length % 2
seq.sort(key=key)
result = [None] * length
result[:length // 2] = seq[::2]
result[length // 2 + offset:] = seq[-1 - offset::-2]
return result
print(middle_sort_mix([1, 5, 3, 7, 3, 2, 9, 8]))
# [1, 3, 5, 8, 9, 7, 3, 2]
print(middle_sort_mix([1, 5, 3, 7, 3, 2, 8]))
# [1, 3, 5, 8, 7, 3, 2]
Benchmarks
Speedwise they are all very similar when the key
parameter is not used, because the execution time is dominated by the copying around:
import random
nums = [10 ** i for i in range(1, 7)]
funcs = middle_sort_flip_OP, middle_sort_flip, middle_sort_half, middle_sort_mix
print(nums)
# [10, 100, 1000, 10000, 100000, 1000000]
def gen_input(num):
return list(range(num))
for num in nums:
print(f"N = {num}")
for func in funcs:
seq = gen_input(num)
random.shuffle(seq)
print(f"{func.__name__:>24s}", end=" ")
%timeit func(seq.copy())
print()
...
N = 1000000
middle_sort_flip_OP 542 ms ± 54.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
middle_sort_flip 510 ms ± 49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
middle_sort_half 546 ms ± 4.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
middle_sort_mix 539 ms ± 63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
On the other hand, when the key
parameter is non-trivial, your approach has a much larger amount of function calls as the other two, which may result in a significant increase in execution time for middle_sort_OP()
:
def gen_input(num):
return list(range(num))
def key(x):
return x ** 2
for num in nums:
print(f"N = {num}")
for func in funcs:
seq = gen_input(num)
random.shuffle(seq)
print(f"{func.__name__:>24s}", end=" ")
%timeit func(seq.copy(), key=key)
print()
...
N = 1000000
middle_sort_flip_OP 1.33 s ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
middle_sort_flip 1.09 s ± 23.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
middle_sort_half 1.1 s ± 27.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
middle_sort_mix 1.11 s ± 8.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
or, closer to your use-case:
class Container():
def __init__(self, x):
self.x = x
def get_x(self):
return self.x
def gen_input(num):
return [Container(x) for x in range(num)]
def key(c):
return c.get_x()
for num in nums:
print(f"N = {num}")
for func in funcs:
seq = gen_input(num)
random.shuffle(seq)
print(f"{func.__name__:>24s}", end=" ")
%timeit func(seq.copy(), key=key)
print()
...
N = 1000000
middle_sort_flip_OP 1.27 s ± 4.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
middle_sort_flip 1.13 s ± 13.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
middle_sort_half 1.24 s ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
middle_sort_mix 1.16 s ± 8.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
which seems to be a bit less dramatic.