1

I have the following merge sort algorithm that works when I test with 100, 1000, or 10000 long linked lists. However, it returns a segmentation fault on the line containing Node->Next=Merge(Front,Back->Next,Type); when using 100000 or 1000000 long linked lists. This has lead me to believe it is a stack overflow rather than a segmentation fault. According to gdb at the time of the error the stack is extraordinarily full. I cannot find a way to examine exactly how many items are in the call stack to give an exact figure. Any help with reworking merge sort to handle large amounts of data would be greatly appreciated.

struct IntList
{
int Value;
int Frequency;
struct IntList* Next;
struct IntList* Prev;
};//Struct for Integer Linked List
void SortList(struct IntList** Values,enum SortBy Type)
{
    struct IntList* Head = *Values;
    if(Head==NULL||Head->Next==NULL)
    {
        return;
    }//If Base Case
    struct IntList* Front;
    struct IntList* Back;
    Split(Head,&Front,&Back);//Splits Linked List
    SortList(&Front,Type);
    SortList(&Back,Type);//Recursively Sorts
    *Values=Merge(Front,Back,Type);//Merges Halves
    return;
}

void Split(struct IntList* Head,struct IntList** Front,struct IntList** Back)
{
    struct IntList* Fast;
    struct IntList* Slow;
    if (Head==NULL||Head->Next==NULL)
    {
        *Front=Head;
        *Back=NULL;
    }//If Length <2
    else
    {
        Slow=Head;
        Fast=Head->Next;
    }
    while(Fast!=NULL)
    {
        Fast=Fast->Next;
        if(Fast!=NULL)
        {
            Fast=Fast->Next;
            Slow=Slow->Next;
        }
    }//Find Midpoint
    *Front=Head;
    *Back=Slow->Next;
    Slow->Next=NULL;//Breaks Link
    return;
}


struct IntList* Merge(struct IntList* Front,struct IntList* Back,enum SortBy Type)
{
    if(Front==NULL)
    {
        return Back;
    }
    if (Back==NULL)
    {
        return Front;
    }//Base Cases

    struct IntList* Node;
    if(Type==DATA)
    {
        if(Front->Value <= Back->Value)
        {
           Node=Front;
           Node->Next=Merge(Front->Next,Back,Type);
        }
        else
        {
            Node=Back;
            Node->Next=Merge(Front,Back->Next,Type);
        }//Takes Greatest Value for Sorted List
    }//If Sorting by Data
    if(Type==FREQUENCY)
    {
        if(Front->Frequency < Back->Frequency)
        {
           Node=Front;
           Node->Next=Merge(Front->Next,Back,Type);
        }
        else
        {
            Node=Back;
            Node->Next=Merge(Front,Back->Next,Type);
        }//Takes Greatest Frequency for Sorted List
    }//If Sorting by Frequency
    return(Node);
trent
  • 25,033
  • 7
  • 51
  • 90
Dustin H
  • 23
  • 2
  • 8
    Stop using recursion to merge lists; do it iteratively. – user2357112 Aug 30 '17 at 22:51
  • 1
    Well, you could block it in your hosts file, you could use a parental controls app... Oh! You meant a *literal* stack overflow! Well, in that case, user2357112 is right, you've probably hit the recursion limit. ;-) – Charles Srstka Aug 30 '17 at 22:59
  • Using recursion produces elegant code,, but you see first hand here its trouble and limitations. At the segfault (or any breakpoint) in `gdb` you can type `bt` (short for backtrace) and it will print out the call stack ... can't _quiiite_ tell if you know this already. – yano Aug 30 '17 at 23:12

2 Answers2

1

If you want to use recursion, it is best to try to express it in tail-call form (so that nothing is done after the recursive call returns other than a return). That way most compilers will optimize the tail-call into a simple jump and not use any stack space. For your Merge function, it becomes something like:

void Merge(struct IntList **merged, struct IntList* Front,struct IntList* Back,enum SortBy Type)
{
    if(Front==NULL) {
        *merged = Back;
    } else if (Back==NULL) {
        *merged = Front;
    } else if(Type==DATA) {
        if(Front->Value <= Back->Value) {
            *merged = Front;
            Merge(&Front->Next, Front->Next, Back, Type);
        } else {
            *merged = Back;
            Merge(&Back->Next, Front, Back->Next,Type);
        }//Takes Greatest Value for Sorted List if Sorting by Value
    } else if(Type==FREQUENCY) {
        if(Front->Frequency < Back->Frequency) {
            *merged = Front;
            Merge(&Front->next, Front->Next, Back, Type);
        } else {
            *merged = Back;
            Merge(&Back->Next, Front, Back->Next, Type);
        }//Takes Greatest Frequency for Sorted List
    }//If Sorting by Frequency
}

If your compiler doesn't support tail recursion optimization, you can do it yourself by making the body of the function a loop:

void Merge(struct IntList **merged, struct IntList* Front,struct IntList* Back,enum SortBy Type)
{
  while(Front || Back) {
    if(Front==NULL) {
        *merged = Back;
        Back = NULL;
    } else if (Back==NULL) {
        *merged = Front;
        Front = NULL
    } else if(Type==DATA) {
        if(Front->Value <= Back->Value) {
            *merged = Front;
            merged = &Front->Next;
            Front = Front->Next;
        } else {
            *merged = Back;
            merged = &Back->Next;
            Back = Back->Next;
        }//Takes Greatest Value for Sorted List if Sorting by Value
    } else if(Type==FREQUENCY) {
        if(Front->Frequency < Back->Frequency) {
            *merged = Front;
            merged = &Front->Next;
            Front = Front->Next;
        } else {
            *merged = Back;
            merged = &Back->Next;
            Back = Back->Next;
        }//Takes Greatest Frequency for Sorted List
    }//If Sorting by Frequency
  }
}
Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
0

This simplest answer is not to use recursion.

However, if you are dead set on using recursion you can limit what goes on the stack by moving temp variables used in the function to outside the scope of the function or declaring them static so that they can be reused rather than making room for them every time merge is called (can have adverse effects if you are interesting in multi-threaded applications). It doesn't look like you have any variables that would be safe to do this with.

You can also encapsulate parameters with another struct so that you don't have to pass in three pointers onto the new stack call, I.E., one parameter takes up less space than 3.

Something like:

struct IntList* Merge(struct mergeData* data)

There are also ways of adjusting the stack size so that your application can handle the data sets you expect.

Overall, this is a limitation of recursion. If you deal with embedded systems that have limited resources (like myself), then you usually avoid it.

Kyle s
  • 484
  • 5
  • 14
  • 1
    *You can also encapsulate parameters with another struct so that you don't have to pass in three pointers onto the new stack call, I.E., one parameter takes up less space than 3.* --- what? You do realize that you will have to make space for the struct on the caller stack? – Ajay Brahmakshatriya Aug 31 '17 at 04:31
  • Allocate in the heap is another option. – Kyle s Sep 01 '17 at 05:35