Benchmarking quicksort and mergesort yields that mergesort is faster

Question

I've tried benchmarking and for some reason when trying both of them on array of 1M elements the Mergesort sorted it in 0.3s and Quicksort took 1.3s.

I've heard that generally quicksort is faster, because of its memory management, but how would one explain these results?

I am running MacBook Pro if that makes any difference. The input is a set of randomly generated integers from 0 to 127.

The codes are in Java:

MergeSort:

static void mergesort(int arr[]) {
    int n = arr.length;
    if (n < 2)
        return;
    int mid = n / 2;
    int left[] = new int[mid];
    int right[] = new int[n - mid];
    for (int i = 0; i < mid; i++)
        left[i] = arr[i];
    for (int i = mid; i < n; i++)
        right[i - mid] = arr[i];
    mergesort(left);
    mergesort(right);
    merge(arr, left, right);
}

public static void merge(int arr[], int left[], int right[]) {
    int nL = left.length;
    int nR = right.length;
    int i = 0, j = 0, k = 0;
    while (i < nL && j < nR) {
        if (left[i] <= right[j]) {
            arr[k] = left[i];
            i++;
        } else {
            arr[k] = right[j];
            j++;
        }
        k++;
    }
    while (i < nL) {
        arr[k] = left[i];
        i++;
        k++;
    }
    while (j < nR) {
        arr[k] = right[j];
        j++;
        k++;
    }
}

Quicksort:

public static void quickSort(int[] arr, int start, int end) {
    int partition = partition(arr, start, end);

    if (partition - 1 > start) {
        quickSort(arr, start, partition - 1);
    }
    if (partition + 1 < end) {
        quickSort(arr, partition + 1, end);
    }
}

public static int partition(int[] arr, int start, int end) {
    int pivot = arr[end];

    for (int i = start; i < end; i++) {
        if (arr[i] < pivot) {
            int temp = arr[start];
            arr[start] = arr[i];
            arr[i] = temp;
            start++;
        }
    }

    int temp = arr[start];
    arr[start] = pivot;
    arr[end] = temp;

    return start;
}

If you implement it inefficiently or give it worst-case inputs, it can easily be slower. That's all we can say without any more details, e.g. we can't help you find any problems in your testing methodology or anything. — Peter Cordes, Jan 03 '20 at 21:44
"I've heard that generally quicksort is faster" The key word in that sentence is "generally". — mypetlion, Jan 03 '20 at 21:45
[This](https://stackoverflow.com/questions/12228659/what-is-the-sorting-algorithm-for-java) question, while not a duplicate of yours, has some interesting reading on how "worse" sorting algorithms can sometimes be better. — mypetlion, Jan 03 '20 at 21:48
`int pivot = arr[end];` is very suspicious: easily susceptible to the O(n^2) worst-case behaviour, especially with already-sorted inputs. You haven't shown if you're re-sorting the same data repeatedly or something. You also haven't shown how you warm up the JVM to make sure you're testing fully optimized JITed code. — Peter Cordes, Jan 03 '20 at 22:00
I am reading from the file of randomly generated integers every time, so it is the same input every time. — Martynas, Jan 03 '20 at 22:02
Okay, so what might be the reason for this slow quicksort is that I used input of values from 0 to 127, which makes a lot of duplicates when sorting array of 1M elements. Increasing the interval of values greatly cut down on the time required to sort out the array using quicksort — Martynas, Jan 03 '20 at 22:17
You could post that as an answer. Also maybe try a better pivot selection algorithm, like median of 3. BTW, CountingSort should be much more efficient than either Quick or Merge sorts if you can take advantage of the known 0..127 range to avoid hashing, just use the number as an array index. That's small enough that you can unroll and distribute over 2 or 4 histograms to mitigate store-forwarding latency when you have lots of the same number over an interval. — Peter Cordes, Jan 04 '20 at 08:07

chqrlie · Accepted Answer · 2020-01-05T17:25:15.387

Your implementations are a bit simplistic:

mergesort allocates 2 new arrays at each recursive call, which is expensive, yet some JVMs are surprisingly efficient at optimising such coding patterns.
quickSort uses a poor choice of pivot, the last element of the subarray, which gives quadratic time for sorted subarrays, including those with identical elements.

The data set, an array with pseudo-random numbers in a small range 0..127, causes the shortcoming of the quickSort implementation to perform much worse than the inefficiency of the mergesort version. Increasing the dataset size should make this even more obvious and might even cause a stack overflow because of too many recursive calls. Data sets with common patterns such as identical values, increasing or decreasing sets and combinations of such sequences would cause catastrophic performance of the quickSort implementation.

Here is a slightly modified version with less pathological choice of pivot (the element at 3/4 of the array) and a loop to detect duplicates of the pivot value to improve efficiency on datasets with repeated values. It performs much better (100x) on my standard sorting benchmark with arrays of just 40k elements, but still much slower (8x) than radixsort:

public static void quickSort(int[] arr, int start, int end) {
    int p1 = partition(arr, start, end);
    int p2 = p1;

    /* skip elements identical to the pivot */
    while (++p2 <= end && arr[p2] == arr[p1])
        continue;

    if (p1 - 1 > start) {
        quickSort(arr, start, p1 - 1);
    }
    if (p2 < end) {
        quickSort(arr, p2, end);
    }
}

public static int partition(int[] arr, int start, int end) {
    /* choose pivot at 3/4 or the array */
    int i = end - ((end - start + 1) >> 2);
    int pivot = arr[i];
    arr[i] = arr[end];
    arr[end] = pivot;

    for (i = start; i < end; i++) {
        if (arr[i] < pivot) {
            int temp = arr[start];
            arr[start] = arr[i];
            arr[i] = temp;
            start++;
        }
    }

    int temp = arr[start];
    arr[start] = pivot;
    arr[end] = temp;

    return start;
}

For the OP's dataset, assuming decent randomness of the distribution, scanning for duplicates is responsible for the performance improvement. Choosing a different pivot, be it first, last, middle, 3/4 or 2/3 or even median of 3 has almost no impact, as expected.

Further testing on other non random distributions shows catastrophic performance for this quickSort implementation due to the choice of pivot. On my benchmark, much improved performance is obtained by choosing for pivot the element at 3/4 or 2/3 of the array (300x improvement for 50k samples, 40% faster than standard merge sort and comparable time to radix_sort).

Mergesort has the distinct advantage of being stable and predictable for all distributions, but it requires extra memory between 50% and 100% of the size of the dataset.
Carefully implemented Quicksort is somewhat faster in many cases and performs in place, requiring only log(N) stack space for recursion. Yet it is not stable and tailor made distributions will exhibit catastrophic performance, possibly crashing.
Radixsort is only appropriate for specific kinds of data such as integers and fixed length strings. It also requires extra memory.
Countingsort would be the most efficient for the OP's dataset as it only needs an array of 128 integers to count the number of occurrences of the different values, known to be in the range 0..127. It will execute in linear time for any distribution.

I would have thought you'd need a median-of-3 or something to reduce the chances of a pessimal pivot in a highly repeating array. If the input is truly random, the 3/4 element is just as likely to be a bad choice as the last, isn't it? Or is that not true for partitions deeper in the recursion, on parts of the array that have already been partitioned a few times? I'm wondering if skipping duplicates of the partition might be where much of the benefit is coming from. — Peter Cordes, Jan 05 '20 at 01:07
@PeterCordes: 3/4 is a quick and dirty choice to handle many non-random sets in N.log N (monotonic, flat, hill, valley, jigsaw...) Purposely made pathological sets will still defeat it easily. If the input is truly random, the 3/4 element is just as likely to be a **good** choice as any other element :). For the OP's set with duplicate yet randomly distributed numbers, skipping the duplicates does the trick. For already sorted sets, the 3/4 pivot choice is not as good as 1/2 but V shaped sets behave horribly with 1/2 pivot. — chqrlie, Jan 05 '20 at 01:24
Ok, so the real reason this is faster for the random(0..127) array is not the pivot choice at all, but rather the dup-skipping? And you're just suggesting the 3/4 pivot as a code-review thing for the benefit of *other* data sets? Or does partitioning lead to some sub-arrays of identical elements, e.g. if the pivot choice happens to be a 127 or a 0 (without the dup-skip)? It's good advice (and thanks for the explanation of why 3/4 is good for some likely non-random shapes), I'm just trying to sort out which changes cause what in this case. — Peter Cordes, Jan 05 '20 at 01:45
@PeterCordes: You are correct, scanning for duplicates does the job for the OP's dataset. On other non-random datasets with various kinds of symmetries, I get much better performance with 3/4 or even 2/3 fixed pivot. — chqrlie, Jan 05 '20 at 16:48
Thanks, and nice update to the answer, would upvote again if I could :) — Peter Cordes, Jan 05 '20 at 16:49
Hey, thanks for this insightful answer, when I changed the dataset I realized the problem was the interval of values, as it created way too many duplicates on larger arrays. However, why does it perform worse on datasets with many duplicates?? Let's say it chooses 60 as pivot, to which subarray do all the 60s of the dataset go? — Martynas, Jan 05 '20 at 21:20
@Martynas: when a subarray has all identical elements, `partition` returns the offset `start` after scanning the whole subarray because it finds no smaller element. `quickSort()` then recurses on the subarray starting at `start+1`, which has the same property, etc. The result is N*(N+1)/2 comparisons for this subarray, hence quadratic time complexity. Skipping the duplicates solves this problem. The same happens is the subarray is already sorted and the pivot is chosen as the first or last element. Choosing a different pivot such as the element at 2/3 solves this pathological case. — chqrlie, Jan 06 '20 at 03:40

Benchmarking quicksort and mergesort yields that mergesort is faster

1 Answers1

Linked