Improving merge sort

Question

I am practicing merge sort and am curious if my second version is better than the first -- it seems to be in terms of memory requirement since I am popping from a list instead of just moving indices

Version 1:

def mergesort(L):
    if len(L)<=1: return L
    pivot=len(L)/2
    left=mergesort(L[:pivot])
    right=mergesort(L[pivot:])
    i=j=0
    sortedArr=[]
    while i<len(left) and j<len(right):
        if left[i]<right[j]:
            sortedArr.append(left[i])
            i+=1
        else:
            sortedArr.append(right[j])
            j+=1
    return sortedArr + left[i:] + right[j:]

Version 2

def mergesort(L):
    if len(L)<=1: return L
    pivot=len(L)/2
    left=mergesort(L[:pivot])
    right=mergesort(L[pivot:])
    sortedArr=[]
    while left!=[] and right!=[]:
        if left[0]<right[0]:
            sortedArr.append(left.pop(0))
        else:
            sortedArr.append(right.pop(0))
    return sortedArr + left + right

Without getting into parallelizing, is there any way to further improve upon Version 2, assuming it is superior to Version 1? How would I describe the memory requirements of these two versions so far?

Popping from the front of a list is going to be horribly inefficient. — Mark Ransom, Mar 22 '13 at 20:25
In terms of memory, you're probably better with version2, but with computation, each `pop` requires python to shift the elements in memory of the list to the left by 1 which will be quite inefficient. I suppose you could reverse the lists and pop off the end to fix that ... — mgilson, Mar 22 '13 at 20:26
It should be O(N) in pretty optimized python code. (and you do it by `lst[::-1]`). — mgilson, Mar 22 '13 at 20:33
So I am assuming that the cost of reversal is O(N) but popping a list is also O(N) since it has to shift everything down one spot, hence costly if we need to pop every element from L at some point (effectively O(N^2)?) — MyNameIsKhan, Mar 22 '13 at 20:35
Right. Popping every element from the left of `L` is an O(N*N) operation. — mgilson, Mar 22 '13 at 20:35
And so if we go with Version 1, is the extra memory requirement O(2N) because we have the sorted left and right lists (size N/2 and N/2) and then the merged result (size N)? Version 2 would have O(N) memory requirement but at the expense of speed? Would it have O(1) auxillary if I instead split the initial list into two lists left and right (instead of making extras) and then sorted each with the pop method? — MyNameIsKhan, Mar 22 '13 at 20:39
Note that you are using O(NlogN) extra memory during the recursive calls. If you want to have a memory efficient version you should never slice the original list, and instead pass around the starting and ending indexes of the slice. — Bakuriu, Mar 22 '13 at 21:26

score 1 · Answer 1 · answered Mar 22 '13 at 23:12

1

why not using a deque from collections ? It would lower the cost of the popleft() operation ?

answered Mar 22 '13 at 23:12

Dvx

289
2
10

score 0 · Answer 2 · edited May 23 '17 at 11:57

For a list L, L[:n] operation is O(n) time, O(n) space in Python (it creates a new list with n elements).

Given a, b, c lists, a + b + c is O(n) time and space, where n is len(a) + len(b) + len(c) (it also creates a new list with n elements).

Thus each call to mergesort() requires T(n) = 2T(n/2) + O(n) time and space i.e., T(n) = O(n*log(n)).

Your second version has worse time complexity due to left.pop(0) that is O(len(left)) operation. The memory requirements for the second version are asymptotically the same as for the first one.

Here's O(n*log(n)) time, O(n) space solution with the same structure (using Python 3.3+ syntax):

def mergesort(L):
    return list(merge_sort_gen(L, 0, len(L)))

def merge_sort_gen(L, start, n): # O(n*log(n)) time, O(n) space
    if n == 1:  # a list of one element is always sorted
        yield L[start]
    elif n > 1: # sort halves and merge them
        half = n // 2
        yield from merge(merge_sort_gen(L, start, half),
                         merge_sort_gen(L, start + half, n - half))

Where merge() merges two sorted iterators. You could use heapq.merge() or:

from functools import partial

def merge(sorted_a, sorted_b, done=object()): # O(n) time, O(1) space
    next_a, next_b = partial(next, sorted_a, done), partial(next, sorted_b, done)

    a, b = next_a(), next_b()
    while a is not done and b is not done:
        if b < a:
            yield b
            b = next_b()
        else:
            yield a
            a = next_a()

    item, rest = (a, sorted_a) if b is done else (b, sorted_b)
    yield item #XXX at least one of sorted_a or sorted_b must be non-empty
    yield from rest

You could follow the code step by step at Python Tutor.

yield from iterable produces the same items as (but the internal details are different):

for item in iterable:
    yield item

See The Python yield keyword explained.

Improving merge sort

2 Answers2

Linked