Quick Sort algorithm not working for all arrays

Question

This is for a homework assignment. I will eventually be translating this code into MIPS assembly, but that is the easy part for me. I have been debugging this code for hours and hours and have been to my professors office hours, but I still cannot make my quick sort algorithm work. Here is the code along with a few of my comments about where I think the problem areas are:

// This struct is in my .h file
typedef struct {
    // v0 points to the first element in the array equal to the pivot
    int *v0;

    // v1 points to the first element in the array greater than the pivot (one past the end of the pivot sub-array)
    int *v1;
} PartRet;

void swap(int *a, int *b) {
    int temp = *a;
    *a = *b;
    *b = temp;
}

PartRet partition(int *lo, int *hi) {
    // Will later be translating this to MIPS where 2 values can be returned. I am using a PartRet struct to simulate this.
    PartRet retval;

    // We must use the last item as the pivot
    int pivot = *hi;

    int *left = lo;

    // Take the last value before the pivot
    int *right = hi - 1;

    while (left < right) {
        while((left < hi) && (*left <= pivot)) {
            ++left;
        }

        while((right > lo) && (*right > pivot)) {
            --right;
        }

        if (left < right) {
            swap(left++, right--);
        }
    }

    // Is this correct? left will always be >= right after the while loop
    if (*hi < *left) {
        swap(left, hi);
    }

    // MADE CHANGE HERE
    int *v0 = hi;
    int *v1;

    // Starting at the left pointer, find the beginning of the sub-array where the elements are equal to the pivot
    // MADE CHANGE HERE
    while (v0 > lo && *(v0 - 1) >= pivot) {
        --v0;
    }

    v1 = v0;

    // Starting at the beginning of the sub-array where the elements are equal to the pivot, find the element after the end of this array.
    while (v1 < hi && *v1 == pivot) {
    ++v1;
    }

    if (v1 <= v0) {
        v1 = hi + 1;
    }

    // Simulating returning two values
    retval.v0 = v0;
    retval.v1 = v1;

    return retval;
}

void quicksort(int *array, int length) {
    if (length < 2) {
        return;
    }

    PartRet part = partition(array, array + length - 1);

    // I *think* this first call is correct, but I'm not sure.
    int firstHalfLength = (int)(part.v0 - array);
    quicksort(array, firstHalfLength);

    int *onePastEnd = array + length;
    int secondHalfLength = (int)(onePastEnd - part.v1);

    // I have a feeling that this isn't correct
    quicksort(part.v1, secondHalfLength);
}

I have even tried rewriting the code using code samples online but the requirement is to use the lo and hi pointers but no code samples that I have found use this. When I debug the code I only end up making the code work for some arrays, not others, especially when the pivot is the smallest element in the array.

As a general rule, I would recommend checking your assumptions against reality. Don't say "this code probably won't ever be hit" - put a breakpoint inside the if, and see whether the breakpoint is hit. Granted, if your breakpoint is never hit, you have not conclusively proven that the breakpoint will never be hit for any input, but you have shown that the code inside the if isn't affecting the tests you are running. — Adam Mihalcin, Mar 27 '13 at 04:58
Please show us compilable code — an SSCCE ([Short, Self-Contained, Correct Example](http://sscce.org/)). What is a `PartRet` structure? Why do you have pointers passed into your partition function? I don't recall ever seeing a partition function that took pointers only; maybe an array base but usually just a couple of array indices. OTOH, I haven't studied quick sort code recently. — Jonathan Leffler, Mar 27 '13 at 04:59
Have you printed out the results of your partition code? Are the partitions it creates correctly partitioned? — Jonathan Leffler, Mar 27 '13 at 05:02
Adam - You're right, the comment is misleading. I just put a breakpoint there and ran my program a number of times using arrays of different sizes and different cases (same elements, ascending, descending. etc.) and the breakpoint was never hit. — greg N, Mar 27 '13 at 05:07
Jonathan - Sorry, I edited that in a minute after I posted it. I have pointers being passed in because I will be translating this code to MIPS and the homework assignment requires pointers, not indices for the partition function. — greg N, Mar 27 '13 at 05:13
Jonathan's 2nd comment - I have been tracing the pointers on paper and I strongly believe that the partition locations are correct, but each partition may contain one incorrect value (i.e. it was incorrectly swapped into that partition). For example when I pass in the array [7, 6, 1], my partition function swaps the 1 and 7 (as it should), but then swaps the 7 and 6 (it shouldn't, but I can only seem to fix it for certain cases which break other cases that were previously working). The resulting array after a call to quicksort is [1, 7, 6]. — greg N, Mar 27 '13 at 05:13
For one thing, you're missing the condition of `if (left < right)` before the swap in the inner loop. You cannot assume the prior code is guaranteed to reach the inner-swap with left always being below right. It should read `if (left < right) { swap(left++, right--); }` — WhozCraig, Mar 27 '13 at 05:21
@WhozCraig - I thought that the continue statements would take care of that, but I could be wrong. I also changed how I advance my left and right pointer a bit to make it look less cryptic. (Side note: is it proper SO etiquette to edit my code like that?) — greg N, Mar 27 '13 at 05:33
+1 for actually putting in effort and taking on board ideas given in the answers below, and using a debugger. — Randy Howard, Mar 27 '13 at 06:47
+1 Likewise. quicksort-recursion is a challenge when first exposed to it. Taking time to get into it, consider multiple ways of looking at it, and really examining your code with a debugger is good. Keep that practice and hone it. — WhozCraig, Mar 27 '13 at 09:01

score 1 · Answer 1 · answered Mar 27 '13 at 05:26

There are problems in the partitioning code. Here are 4 simple test outputs, from the SSCCE code below:

array1:
Array (Before):
[6]: 23 9 37 4 2 12
Array (First half partition):
[3]: 2 9 4
Array (First half partition):
[1]: 2
Array (Second half partition):
[1]: 9
Array (Second half partition):
[2]: 23 37
Array (First half partition):
[0]:
Array (Second half partition):
[1]: 23
Array (After):
[6]: 2 4 9 12 37 23


array2:
Array (Before):
[3]: 23 9 37
Array (First half partition):
[1]: 23
Array (Second half partition):
[1]: 9
Array (After):
[3]: 23 37 9


array3:
Array (Before):
[2]: 23 9
Array (First half partition):
[0]:
Array (Second half partition):
[1]: 23
Array (After):
[2]: 9 23


array4:
Array (Before):
[2]: 9 24
Array (First half partition):
[0]:
Array (Second half partition):
[1]: 9
Array (After):
[2]: 24 9

The SSCCE code

#include <stdio.h>

typedef struct
{
    int *v0;    // v0 points to the first element in the array equal to the pivot
    int *v1;    // v1 points to the first element in the array greater than the pivot (one past the end of the pivot sub-array)
} PartRet;

static void dump_array(FILE *fp, const char *tag, int *array, int size)
{
    fprintf(fp, "Array (%s):\n", tag);
    fprintf(fp, "[%d]:", size);
    for (int i = 0; i < size; i++)
        fprintf(fp, " %d", array[i]);
    putchar('\n');
}

static void swap(int *a, int *b)
{
    int temp = *a;
    *a = *b;
    *b = temp;
}

static PartRet partition(int *lo, int *hi)
{
    // Will later be translating this to MIPS where 2 values can be
    // returned.  I am using a PartRet struct to simulate this.
    PartRet retval;

    // This code probably won't ever be hit as the base case in the QS
    // function will return first
    if ((hi - lo) < 1)
    {
        retval.v0 = lo;
        retval.v1 = lo + (hi - lo) - 1;
        return retval;
    }

    // We must use the last item as the pivot
    int pivot = *hi;

    int *left = lo;

    // Take the last value before the pivot
    int *right = hi - 1;

    while (left < right)
    {
        if (*left <= pivot)
        {
            ++left;
            continue;
        }

        if (*right >= pivot)
        {
            --right;
            continue;
        }

        swap(left, right);
    }

    // Is this correct? left will always be >= right after the while loop
    swap(left, hi);

    int *v0 = left;
    int *v1;

    // Starting at the left pointer, find the beginning of the sub-array
    // where the elements are equal to the pivot
    while (v0 > lo && *(v0 - 1) == pivot)
    {
        --v0;
    }

    v1 = v0;

    // Starting at the beginning of the sub-array where the elements are
    // equal to the pivot, find the element after the end of this array.
    while (v1 < hi && *v1 == pivot)
    {
        ++v1;
    }

    // Simulating returning two values
    retval.v0 = v0;
    retval.v1 = v1;

    return retval;
}

static void quicksort(int *array, int length)
{
    if (length < 2)
    {
        return;
    }

    PartRet part = partition(array, array + length - 1);

    // I *think* this first call is correct, but I'm not sure.
    int firstHalfLength = (int)(part.v0 - array);
    dump_array(stdout, "First half partition", array, firstHalfLength);
    quicksort(array, firstHalfLength);

    int *onePastEnd = array + length;
    int secondHalfLength = (int)(onePastEnd - part.v1);

    // I have a feeling that this isn't correct
    dump_array(stdout, "Second half partition", part.v1, secondHalfLength);
    quicksort(part.v1, secondHalfLength);
}

static void mini_test(FILE *fp, const char *name, int *array, int size)
{
    putc('\n', fp);
    fprintf(fp, "%s:\n", name);
    dump_array(fp, "Before", array, size);
    quicksort(array, size);
    dump_array(fp, "After", array, size);
    putc('\n', fp);
}

int main(void)
{
    int array1[] = { 23, 9, 37, 4, 2, 12 };
    enum { NUM_ARRAY1 = sizeof(array1) / sizeof(array1[0]) };
    mini_test(stdout, "array1", array1, NUM_ARRAY1);

    int array2[] = { 23, 9, 37, };
    enum { NUM_ARRAY2 = sizeof(array2) / sizeof(array2[0]) };
    mini_test(stdout, "array2", array2, NUM_ARRAY2);

    int array3[] = { 23, 9, };
    enum { NUM_ARRAY3 = sizeof(array3) / sizeof(array3[0]) };
    mini_test(stdout, "array3", array3, NUM_ARRAY3);

    int array4[] = { 9, 24, };
    enum { NUM_ARRAY4 = sizeof(array4) / sizeof(array4[0]) };
    mini_test(stdout, "array4", array4, NUM_ARRAY4);

    return(0);
}

I've not made any algorithmic changes in the sorting code. I simply added the dump_array() function, and made strategic calls to it, and added the mini_test() function and the main(). These sorts of fixtures are immensely helpful. Note that when the input array is of size 2 and is in the wrong order, the partitioning is correct, but when the size is 2 and is in the right order, the partitioning has reversed the positions of the array elements. This is problematic! Resolve it, and you may have most of the rest fixed. Do play with some 3 element arrays in all 6 permutations (of 3 distinct values); consider playing with 3 elements and only 2 distinct values, and 3 elements and just 1 value too.

Wow, that really clears it up! I have made a few changes to my code, I'll spend some time with the debugger to see if I can further pinpoint my issues. — greg N, Mar 27 '13 at 06:02
Followup - My program now works for all test cases you provided! — greg N, Mar 27 '13 at 07:28
(Sorry, still getting used to editing comments) I reflected code changes in OP. But - There is still a case where it doesn't work and I don't understand why. The algorithm seems to work like it should (but obviously it doesn't). Here is the output using your test functions and some comments: `array2: Array (Before): [4]: 8 4 6 7 Array (First half partition): (CORRECT) [2]: 6 4 Array (First half partition): (CORRECT) [0]: Array (Second half partition): (CORRECT) [1]: 6 Array (Second half partition): (INCORRECT - SHOULD BE 8) [0]: Array (After): [4]: 4 6 8 7` — greg N, Mar 27 '13 at 07:35

WhozCraig · Answer 2 · 2013-03-27T06:22:08.857

Note: I'm only posting this for you to have a look at a different approach. I've already pointed out at least one defect in your algorithm in-comment above. Consider this.

A traditional in-place quicksort with integrated partitioning usually looks something like the following, though everyone seems to have their favorites. I prefer it like this for the simplicity (the partition and recursion are in the same proc). Most of all, the translation to assembly is stupid-simple, but you can probably already tell that by now:

void quicksort(int *lo, int *hi)
{
    /* early exit on trivial slice */
    size_t len = (hi - lo) + 1;
    if (len <= 1)
        return;

    /* use hi-point for storage */
    swap(lo + len/2, hi);

    /* move everything in range below pivot */
    int *pvt=lo, *left=lo;
    for(; left != hi; ++left)
        if (*left <= *hi)
            swap(left, pvt++);

    /* this is the proper spot for the pivot */
    swap(pvt, hi);

    /* recurse sublists. do NOT include pivot slot. */
    quicksort(lo, pvt-1);
    quicksort(pvt+1, hi);
}

The only real variable in this algorithm is how to compute the "spot" where the initial pivot value is extracted. Many implementations today use a random point, as it helps diffuse the degenerative performance of quicksort for nearly-sorted lists:

    swap(lo + (rand() % len), hi);

Other implementations just like grabbing the element out of the middle of the slice, as I have done in the algorithm above:

    swap(lo + len/2, hi);

You might find it interesting that some people grab no mid-element and swap at all, just using whatever happens to be in the hi-slot as the pivot value. That reduces the code by a mere one line but with a HUGE caveat: it will have a guaranteed terrible swap-performance on nearly-sorted or entirely-sorted lists (everything would swap).

Anyway, if nothing else I hope it helps you wrap your head around what the partitioning part of the algorithm is trying to do: shove everything below the pivot value in slots at the head of the list, everything above it at the tail of the list, then recursing into those partitions, but above all, do NOT include the pivot slot in that recursion of either slice. It is already where it needs to be (and is, in fact, the only thing that is guaranteed to be so at that point).

Good to know, thanks! I haven't read much on good techniques to pick the optimal pivot, it is interesting to see some different approaches. — greg N, Mar 27 '13 at 06:51
For picking pivots, see [Quicksort: Choosing the Pivot](http://stackoverflow.com/questions/164163/quicksort-choosing-the-pivot/164183#164183). Note the killer adversary paper, for amusement's sake if nothing else. — Jonathan Leffler, Mar 27 '13 at 09:34
@JonathanLeffler I love that post =P Its all-that-and-then-some for everything you ever wanted to know (and a lot that you *didn't*) about choosing pivot values. — WhozCraig, Mar 27 '13 at 09:52

Quick Sort algorithm not working for all arrays

2 Answers2

The SSCCE code