0

I have to measure the time it takes quick-sort and merge-sort to sort out integers in a file with random numbers.

I read online that for a large amount of data merge-sort is supposed to be faster than quick-sort but my measurements say the opposite, merge-sort takes almost twice the time that quick-sort takes.

Is my implementation of these algorithms the problem?

Results of the measurements:

Quick-Sort:

  • 5 milion int - 715,194,000 ns
  • 10 milion int - 1,383,187,400 ns
  • 50 milion int - 6,819,586,800 ns
  • 100 milion int - 14,159,986,000 ns
  • 150 milion int - 22,431,202,200 ns

Merge-Sort:

  • 5 milion int - 1,644,349,000 ns

  • 10 milion int - 2,186,410,800 ns

  • 50 milion int - 14,427,917,500 ns

  • 100 milion int - 26,487,337,400 ns

  • 150 milion int - 42,229,109,700 ns

     //Quick Sort Implementation
     public static void quickSort(int[] a, int start, int end){
         if(end - start < 2){
            return;
         }
    
         int pivtIndex = partition(a, start, end);
         quickSort(a, start, pivtIndex);
         quickSort(a, pivtIndex + 1, end);
    
     }
    
     public static int partition(int[] a, int start, int end){
    
         int pivot = a[start];
         int i = start;
         int j = end;
    
         while(i < j){
             while(i < j && a[--j] >= pivot);
             if(i < j){
                 a[i] = a[j];
             }
    
             while(i < j && a[++i] <= pivot);
    
             if(i < j){
                 a[j] = a[i];
             }
         }
    
         a[j] = pivot;
    
         return j;
     }
    
    
    
     //Merge Sort Implementation
    
     public static void mergeSort(int[] input, int start, int end){
         if(end - start < 2){
             return;
         }
    
         int mid = (start + end)/2;
         mergeSort(input, start, mid);
         mergeSort(input, mid, end);
    
         merge(input, start, mid, end);
    
     }
    
     public static void merge(int[] input, int start, int mid, int end){
         if(input[mid-1] <= input[mid]){
             return;
         }
    
         int i = start, j = mid, tempIndex = 0;
    
         int[] temp = new int[end - start];
    
         while(i < mid && j < end){
              temp[tempIndex++] = input[i] <= input[j] ? input[i++] : input[j++];
         }
    
         System.arraycopy(input, i, input, start + tempIndex, mid - i);
         System.arraycopy(temp, 0, input, start, tempIndex);
     }
    
Amin Gheibi
  • 639
  • 1
  • 8
  • 15
Ksi
  • 37
  • 7
  • 1
    That merge sort is twice as slow because it is not efficient. Example 2 way merge sorts in [this answer](https://stackoverflow.com/questions/46106922/46107386#46107386). Example hybrid 4 way merge sort + insertion sort in Java in [this answer](https://stackoverflow.com/questions/55840901/55842481#55842481). The insertion sort part could be removed. In my testing, merge sort is about 15% slower than quick sort on arrays of integers, and 4 way merge sort (without heap) is about the same speed as quick sort. Using insertion sort for small sub-arrays speeds up both quick sort and merge sort. – rcgldr Dec 06 '20 at 16:42
  • 1
    …and now try the same with arrays not filled with random data, but ascending numbers. That’s a good starting point for learning the difference between best case, average case, and worst case. – Holger Dec 07 '20 at 12:32

2 Answers2

3

I think even, though the size of the array is of the range 150 * 10^6, the Quicksort must have gained an advantage due to the following reason:

Assuming the size of integer in JAVA to be 4 bytes,

(((150 * 10^6 * 4 bytes ) / 1024 bytes ) / 1024 mega bytes) ~ 572 MB.

Size of L3 cache is around 50 MB.

Therefore, (572 - 50) ~ 522 MB memory has to be taken care with the use of main memory (excluding L1 and L2 as their size is relatively low).

Now, for the extra 522 MB, Merge Sort has to take help of main memory.

Therefore, it is also obvious that Merge Sort will have to use the main memory, which is required for the accessory arrays.

Accessing main memory is a heavy operation and Merge Sort need more access to main memory than the Quick Sort because of its accessory arrays.

Deepak Tatyaji Ahire
  • 4,883
  • 2
  • 13
  • 35
  • 1
    Classic merge sort needs a 3 way cache, 2 for the two sub-arrays being merged, 1 for the merged output. Most X86 processors have 8 way cache so no issue there. I find classic merge sort to only be 15% slower than quick sort on large arrays, and I find 4 way bottom up merge sort (without heap) to be as fast as quick sort on the systems I've tested it on. The OP's premise that merge sort is twice as slow is due to a poor implementation of merge sort. – rcgldr Dec 06 '20 at 16:36
1

In the ideal world or true random list of numbers quick-sort should come out best, there are however some problems when data misbehaves.

This looks like very nice implementations of the original papers. I will just assume that you have check that the integers are indeed sorted correctly.

Some corner cases that should be checked for O(N^2) as you select first element as pivot.

  • already sorted, as you select first element as pivot.
  • organ pipe 1,2,3,2,1 can cause misbehaves too.
  • reverse sorted, if you take last element as pivot
  • few uniques, try a pattern with only 0,1
  • nearly sorted a few misplaced numbers.

For the merge sort you need to check that (start + end) doesn't cause an integer overrun.

In merge sort you can also optimize with some smart swapping of the allocations so that you only need to allocate once.

Both algorithms can be optimized with an insertion sort when the length of the sub-array falls below some threshold, which usually falls within 11-32.

Surt
  • 15,501
  • 3
  • 23
  • 39
  • Few uniques is not an issue if using Hoare partition scheme. Although needless swaps are done, with Hoare, the more duplicates there are, the closer to ideal partitioning split (50% 50%). – rcgldr Dec 08 '20 at 17:24