13

This generates a Segmentation Fault: 11 and I have no clue why.

Before I get into it, here's the code:

import numpy.random as nprnd
import heapq
import sys

sys.setrecursionlimit(10**6)


def rlist(size, limit_low, limit_high):
    for _ in xrange(size): 
        yield nprnd.randint(limit_low, limit_high)

def iterator_mergesort(iterator, size):
    return heapq.merge(
         iterator_mergesort(
           (iterator.__next__ for _ in xrange(size/2)), size/2),
         iterator_mergesort(
            iterator, size - (size/2))
       )

def test():
    size = 10**3
    randomiterator = rlist(size, 0, size)
    sortediterator = iterator_mergesort(randomiterator, size)
    assert sortediterator == sorted(randomiterator)

if __name__ == '__main__':
    test()

Basically, it's just a mergesort that works on iterators and generator expressions instead of working on lists so as to minimize the memory footprint at any one time. It's nothing special, and uses the heapq.merge() built-in method for merging iterators, so I was quite surprised when everything breaks.

Running the code quickly gives Segmentation Fault: 11 and an error window telling me python has crashed. I have no idea where to look or how to debug this one, so any help would be much appreciated.

reem
  • 7,186
  • 4
  • 20
  • 30
  • Typically, the only times you'll get a segfault in python is either when you're out of memory or there's a bug in one of the C modules you're using. [This question](http://stackoverflow.com/questions/10035541/what-causes-a-python-segmentation-fault) may be useful to you. – rnorris Oct 01 '13 at 23:44
  • Oh I feel quite dumb now, I forgot to stick a base case in my mergesort, so increasing the recursion limit breaks everything. – reem Oct 01 '13 at 23:49
  • 3
    @sortfiend -- If you found the problem, you should probably write it up as a [short answer and accept it](http://meta.stackoverflow.com/help/self-answer), rather then editing the title to say "RESOLVED". That way, this post will play nicer with StackOverflow's algorithms, and you might end up accumulating a few more upvotes here and there :) – Michael0x2a Oct 02 '13 at 00:17

1 Answers1

10

Segmentation Faults in python happen for one of two reasons:

You run out of memory

Bug in a C module

Here, the seg fault belongs to the first. You (I) have a boundless recursion because there is no base case in iterator_mergesort(), it'll keep calling itself on itself forever and ever.

Normally, python throws an exception for this and it will terminate before causing a segfault. However, the recursion limit has been set extremely high so python runs out of memory and breaks before it recognizes it should throw an exception for an unbounded recursion.

Add a base case like so:

...
def iterator_mergesort(iterator, size):
return heapq.merge(
         iterator_mergesort(
           (iterator.next() for _ in xrange(size/2)), size/2),
         iterator_mergesort(
            iterator, size - (size/2))
       ) if size >= 2 else iterator #<-- Specifically this

Now it passes the test() function and sorts, albeit rather slowly.

reem
  • 7,186
  • 4
  • 20
  • 30