3

I implement this merge sort based on pseudo-code available on Cormen's Book. I'm reproducing it, since it's short:

void merge(vector<double> &array, int start, int mid, int end) {
    int i = start;
    int j = mid + 1;
    int k = start;
    vector<double> b(array.size());

    while (i <= mid && j <= end) {
        if (array[i] <= array[j])
            b[k++] = array[i++];
        else
            b[k++] = array[j++];
    }

    while(i <= mid)
        b[k++] = array[i++];

    while(j <= end)
        b[k++] = array[j++];

    for (k = start; k <= end; k++)
        array[k] = b[k];
}

This part should be O(n)

and this other should be O(n*lg n) , where lg is log on 2 base

void mergeSort(vector<double> &array, int start, int end) {
    if (start < end) {
        int mid = (end - start) / 2 + start;
        mergeSort(array, start, mid);
        mergeSort(array, mid + 1, end);
        merge(array, start, mid, end);
    }
}

I made some experiments with random vector instances of sizes: 1000(10^3), 10000(10^4), 50000(5*10^4), 100000(10^5), 250000(2.5*10^5), 500000(5*10^5). With 30 instances in each size. This are the average result I had for each instance size:

1000 - ~0.000 s
10000 - 0.344 s
50000 - 20.456 s
100000 - 59.083 s
250000 - 360.814 s
500000 - 1729.245 s

All elapsed times I take from linux time command (take users time) when running my merge sort. As visible, it's not a O(n*lg n) behavior. What I'm missing here? I don't know if it's relevant but my system configurations are:

OS: Fedora 18 - 64 bits
Processor: Intel® Core™ i3 CPU M 380 @ 2.53GHz × 4
Memory: 2.8 Gi

B

Hugo Oshiro
  • 380
  • 4
  • 17
  • Did you run debugged code or optimized code? The difference can be huge because some of your loops could be parallelized. Theoretically, the algorithm `merge` is `O(n)` in terms of comparisons: but you do much more than that. You do a ton of accessing, incrementing, assignment, and you even allocate a new vector. All this eats away at performance (more specifically, poorly cached memory and cold memory hurts performance). –  Apr 19 '14 at 04:35
  • 3
    Try changing `vector b(array.size());` for `vector b(end-start);`, and use `k=0`. Your `b` is much bigger than it should be, not sure that is the problem. – imreal Apr 19 '14 at 04:35
  • 1
    That is probably your problem, `vector` fill construction has linear time. – imreal Apr 19 '14 at 04:41

2 Answers2

7

Here's the culprit:

vector<double> b(array.size());

Suppose you start with a vector of half a million entries. That initial call to mergeSort is going to call mergeSort on a vector of half a million entries, but only have it sort the first 250,000 elements. (Then it will repeat on the next half.) The next call to mergeSort is going to receive that full 500,000 element array and call mergeSort to use sort the first and then second 125,000 elements of the array. And so on. Each time along the way, mergeSort will be receiving that vector of half a million entries, but only sorting a subset. Eventually you'll be calling merge, where a temporary array of half a million elements will be allocated and initialized on each every call.

The result is n2*log(n) behavior. That's not exponential behavior, but still it's not good.

I see three different solutions:

  • Allocate that temporary b once and pass it as an argument to mergeSort and merge.
  • Allocate a temporary array in merge that is of size end-start+1. Now you'll have to use offsets to deal with the fact that b[0] corresponds to array[start].
  • Merge in place. You don't need a temporary here. However, this is nontrivial and will make the algorithm an O(N*(log(N))^2) algorithm.
David Hammen
  • 32,454
  • 9
  • 60
  • 108
2

It seems that the relocation of the vectors are taking much of the time. Adding to vectors are not O(1) operations. Try to change the vector to a basic C type array, and you will notice the difference. Also, what I see from the values, it is no way exponential. A higher polynomial maybe.

Community
  • 1
  • 1
HelloWorld123456789
  • 5,299
  • 3
  • 23
  • 33