Quick sort Vs. Merge sort Performance Analysis

Question

Merge sort has worst case complexity of O(logN) vs Quick sort has O(N^2), so theoretically merge sort is supposed to perform better than quick sort. But I heard due to some copy overhead most of the cases quick sort outperforms merge sort. See the reference.

Then I decided to implement and test, below is my full source code in C,

Source

#include <stdio.h>
#include <time.h>

#define SZ 10000000
#define MOD 10000007
#define i64 long long int

i64 nums[SZ];

i64 L[SZ], R[SZ];

i64 seed = 0xff;
i64 srand(){
    seed = (seed + 17 * seed) % MOD;
    return seed;
}

void make(){
    for (register int i = 0; i < SZ; i++)
        nums[i] = srand() % MOD;
}

void swap(i64 *a, i64 *b){
    i64 t = *a;
    *a = *b;
    *b = t;
}

int pivote(int s, int e){

    //int p = s + srand() % (e - s + 1);
    int p = s + (e - s) / 2;
    //int p = s;
    //int p = e;

    i64 v = nums[p];
    int c = s;
    swap(nums + p, nums + e);
    for (register int i = s; i < e; i++){
        if (nums[i] < v){
            swap(nums + i, nums + c);
            c++;
        }
    }
    swap(nums + c, nums + e);
    return c;
}

void qsort(int s, int e){

    if (s < e){
        int p = pivote(s, e);
        qsort(s, p - 1);
        qsort(p + 1, e);
    }
}

void merge(i64 arr[], int l, int m, int r){
    int i, j, k;
    int n1 = m - l + 1;
    int n2 = r - m;

    for (i = 0; i < n1; i++)
        L[i] = arr[l + i];
    for (j = 0; j < n2; j++)
        R[j] = arr[m + 1 + j];

    i = 0;
    j = 0;
    k = l;
    while (i < n1 && j < n2)
    {
        if (L[i] <= R[j])
        {
            arr[k] = L[i];
            i++;
        }
        else
        {
            arr[k] = R[j];
            j++;
        }
        k++;
    }

    while (i < n1)
    {
        arr[k] = L[i];
        i++;
        k++;
    }

    while (j < n2)
    {
        arr[k] = R[j];
        j++;
        k++;
    }
}

void mergeSort(i64 arr[], int l, int r){
    if (l < r){
        int m = l + (r - l) / 2;

        mergeSort(arr, l, m);
        mergeSort(arr, m + 1, r);
        merge(arr, l, m, r);
    }
}


void testQsort(){
    double s, e;

    make();

    s = clock();
    qsort(0, SZ - 1);
    e = clock();
    printf("qsort random: %Lf ms\n", (e - s) / 1);

    s = clock();
    qsort(0, SZ - 1);
    e = clock();
    printf("qsort sorted: %Lf ms\n", (e - s) / 1);

}

void testMsort(){
    double s, e;

    make();

    s = clock();
    mergeSort(nums, 0, SZ - 1);
    e = clock();
    printf("msort random: %Lf ms\n", (e - s) / 1);

    s = clock();
    mergeSort(nums, 0, SZ - 1);
    e = clock();
    printf("msort sorted: %Lf ms\n", (e - s) / 1);
}

int main(){

    testMsort();
    testQsort();

    return 0;
}

Result for 10 million elements:

msort random: 4596.000000 ms
msort sorted: 3354.000000 ms
qsort random: 7637.000000 ms
qsort sorted: 5074.000000 ms

I have used four versions of quick sort,

pivote at the first position
pivote at the last position
pivot at the middle position
pivote at the random position

None of the versions of quick sort seems to outperform merge sort. Could anyone tell why it was mentioned that quick sort outperforms merge sort?

Is there any wrong with my quick sort implementation?

Update 1

Following the answer of @rcgldr mentioned below, I have tested the below version of the quick sort and it finally outperforms any version of the merge sort.

void qsort3(int s, int e){
    if (s < e){
        i64 p = nums[(s + e) / 2];
        int i = s - 1;
        int j = e + 1;
        while (true){
            while (nums[++i] < p);
            while (nums[--j] > p);
            if (i >= j) break;
            swap(nums + i, nums + j);
        }
        qsort3(s, j);
        qsort3(j + 1, e);
    }
}

1) Where pivot is does not matter on a random list. 2) Most qsort implementations fall back to insertion sort when the list is small enough (say, under 10 elements). — Amadan, Oct 02 '18 at 04:17
To make sure your implementations are correct, you should also check that the sorted list is in fact sorted. It's also a bit suspicious that the scope access is a bit different between qsort and msort, which could have a dramatic impact on speed. Remember, Computer Science tells us how fast an algorithm is in terms of mathematical operations, but it doesn't consider hardware efficiencies (cache hit/miss, RAM alignment, difference in random and sequential access, etc). And when you aren't looking at machine code, you don't know what inefficiencies your compiler is adding in. — IceArdor, Oct 02 '18 at 04:38
@IceArdor thanks, I rechecked and confirming that the lists are properly sorted. Thanks for your points, I was curious about the cases you mentioned. Both in Mac & Windows it seems merge sort outperforming quick sort. — Sazzad Hissain Khan, Oct 02 '18 at 04:43
Think about what could be happening with the hardware between the two runs. It's impossible to run code on a general purpose processor and have it take the exact amount of time both runs. You include memory allocation in the both timing results. I'm also not convinced that your code exactly matches the Quicksort and Mergesort reference algorithms. If you care about timing, think about every additional mathematical operation your code does, since at its core, sorting isn't doing any heavy lifting (it's just moving values around in memory, basic arithmetic and comparisons). — IceArdor, Oct 02 '18 at 05:03
For example, consider the number of unoptimized math operations required for `for (int j=0; j>=SZ; j++) { swap(a+j, k); }` and `int stop = a+SZ; for (int j=a; j>=stop; j++) { swap(j, k); }` — IceArdor, Oct 02 '18 at 05:07

rcgldr · Accepted Answer · 2018-10-03T03:06:34.510

The question's example for quick sort is based on Lomuto partition scheme, which is slower than Hoare partition scheme. Link to an example of Hoare partition scheme:

QuickSort with middle elemenet as pivot

The merge sort example is constantly creating sub-arrays and copying data. A more efficient approach is to do a one time allocation of an array, then change the direction of merge based on the level of recursion for top down merge sort, or based on the pass count for bottom up merge sort. Link to a java source code showing both bottom up and top down merge sort. This can be easily converted to c:

'MergeSort Algorithm' - What's the better implementation in JAVA?

As for the relative performance, a simple quick sort such as the one linked to in this answer is about 15% than a basic merge sort for sorting an array of simple elements like integers or floating point numbers. However, if the quick sort is enhanced to avoid worst case time complexity of O(n^2), the advantage is decreased, and the main advantage is that it doesn't require O(n) space that merge sort needs for it's merge operations. In general, merge sort does more moves but fewer compares than quicksort. If sorting an array of pointers to objects, the compare overhead becomes greater than the time it takes to move pointers, and merge sort ends up being faster. On the other hand, sorting an array of pointers to objects involves random accessing of those objects, which isn't cache friendly, and it's faster to sort the objects rather than the pointers unless the objects are fairly large (the trade off is typically around 128 to 256 bytes, depending on the system).

Thank you so much for pointing out the reason. I have seen your implementation. Its really awesome and very small piece of code. I have one confusion though. Shouldn't it work fine if I change the last two lines with `QuickSort(a, lo, i-1); QuickSort(a, i, hi);` rather `QuickSort(a, lo, j); QuickSort(a, j + 1, hi);` — Sazzad Hissain Khan, Oct 02 '18 at 12:24
As for quick sort out performing "any version" of merge sort, on a processor with 16 registers, most of them used as pointers, then a 4 way merge sort is about as fast as a quick sort. However, I don't know if 4 way merge sort is faster than 3 way quick sort. — rcgldr, Oct 02 '18 at 15:00

Quick sort Vs. Merge sort Performance Analysis

Source

Result for 10 million elements:

Update 1

1 Answers1