1

python has a default maximum recursion depth which I am able to increase:

import sys

sys.setrecursionlimit(100000)

I'm using merge sort and when I try it on a list of 80000 elements, python "quits unexpectedly". This won't be an issue of I implemented merge sort iteratively, but I am interested in the recursive one.

I'm using a Mac OSX 8GB Memory. Is there a way to get this to work on my machine, or will it work on a better machine?

import sys

sys.setrecursionlimit(100000) # python has a recurison depth of < 1000~. so for the purpose of this assignment I'm increasing it

counter = 0


def merge_sort(lst):
    global counter
    if len(lst) <= 1:
        counter += 1   # increment counter when we divide array in two
        return lst
    mid = len(lst) // 2
    left = merge_sort(lst[:mid])
    right = merge_sort(lst[mid:])
    return merge(left, right)


def merge(left, right):
    global counter
    if not left:
        counter += 1   # increment counter when not left (not left - is also comparison)
        return right
    if not right:
        counter += 1   # the same as above for right
        return left
    if left[0] < right[0]:
        counter += 1   # and the final one increment
        return [left[0]] + merge(left[1:], right)
    return [right[0]] + merge(left, right[1:])


lines = [line.rstrip('\n') for line in open('words.txt')]

when I try the above on a 40000 it works and sorts the list:

print(merge_sort(lines[0:40000]))

but on 50000 or above it doesn't. The total number of words in the .txt file is around 80000

the message I get:

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
wa7d
  • 129
  • 1
  • 2
  • 12
  • Are you sure that recursion depth is your problem? – Amber Jul 14 '17 at 06:45
  • Is it the stack depth? I'd appreciate your insight on this – wa7d Jul 14 '17 at 06:47
  • 3
    Default recursion depth of python should be enough for list of 80000 elements. Because depth of recursion of merge sort for that will be just `log_2(80000) = 17` approx. Mostly likely your implementation has a bug in it. – Pratik Deoghare Jul 14 '17 at 06:48
  • mergesort should get nowhere near maximum recursion depth for just 80000 elements. With default recursion depth you should be able to sort 2^1000 elements without hitting the limits, a number so large you could literally sort a list of atoms in the universe and not get close. Your implementation must be the problem. – Zinki Jul 14 '17 at 07:02
  • edited to include implementation – wa7d Jul 14 '17 at 07:04
  • The issue may not be the recursion depth, but that you're making many copies of the list - each time you take a slice. Try instead passing the original list and the indices of each section, and editing the list in place. – perigon Jul 14 '17 at 07:15
  • Also, if you're using this for a real problem (rather than just to practice coding merge sort), you're almost certainly better off using Python's built-in sort. – perigon Jul 14 '17 at 07:17
  • Seems the problem isn't the `merge_sort` (which has logarithmic recursion depth) but you implementation of `merge`, which has LINEAR recursion depth. So due to how you wrote `merge()`, the recursion depth here can reach `len(lines)`. You should rewrite the `merge()` function to a loop. – Zinki Jul 14 '17 at 07:19
  • @user55449 yes I'm doing this for practice. I want to know the reason why it stops working when the length of the list is greater than 40000~ – wa7d Jul 14 '17 at 07:22
  • @Zinki certainly I can implement merge sort iteratively but I'm more interested on why it stops working at a certain list length. – wa7d Jul 14 '17 at 07:23
  • You're making a lot of pretty big lists, and keeping all of them. At some list size, your computer's memory will fill up in the process. It seems that on your machine, that size is 40000 - I don't think there's a more interesting explanation there (though maybe someone else has a more satisfying explanation). If you avoid making all the lists and make UmNyobe's suggested edits, you'll be able to sort longer lists. – perigon Jul 14 '17 at 07:32
  • @user55449 I am using this single test only print(merge_sort(lines[0:40000])) and it still doesn't work. I don't think the problem is making a lot of big lists. I have concluded that it's a python issue form the answer below. – wa7d Jul 14 '17 at 07:34

1 Answers1

4

The problem comes from your merge(left, right) implementation which is recursive in O(n). You merge two sorted list by one element at each recursion step. The idea of merge being recursive may make sense in languages where tail-recursion is optimized, but it is not the case in python.

In general, merge is iterative as its complexity will always be at least the number of elements to merge.

def merge(left, right):
    merged = []
    i = 0
    j = 0
    while i < len(left) and j < len(right) :
        if left[i] < right[j]:
            merged.append(left[i])
            i+=1
        else:
            merged.append(right[j])
            j+=1
    merged += left[i:]
    merged += right[j:]
    return merged
UmNyobe
  • 22,539
  • 9
  • 61
  • 90
  • Thanks, Luffy. this makes sense now. I take it there's no way around this except re-implementing the code iteratively – wa7d Jul 14 '17 at 07:27
  • You can just make merge iterative, while keeping the main mergesort method recursive. Your overall runtime is O(n log n), and the recursion depth is O(log n). – perigon Jul 14 '17 at 07:32
  • 1
    @wa7d implementation added – UmNyobe Jul 14 '17 at 07:39