Why is recursive Merge Sort preferred over iterative Merge Sort even though the latter has auxillary space complexity?

Question

While studying about Merge Sort algorithm, I was curious to know if this sorting algorithm can be further optimised. Found out that there exists Iterative version of Merge Sort algorithm with same time complexity but even better O(1) space complexity. And Iterative approach is always better than recursive approch in terms of performance. Then why is it less common and rarely talked about in any regular Algorithm course? Here's the link to Iterative Merge Sort algorithm

`And Iterative approach is always better than recursive approch in terms of performance.` See: [Quicksort: Iterative or Recursive](https://stackoverflow.com/q/12553238/572670). — amit, Mar 18 '21 at 17:15
As for the rest of the question: the word "preferred" by its nature is "opinion based", and thus not a good fit for Stackoverflow, and the whole question revolves around the preference of one over the other. — amit, Mar 18 '21 at 17:16
@amit I find that claim dubious. It does not match my experience. It does not match the benchmark in your link. And it does not match how CPU caches work. Once you've loaded a section of data, doing all the work you can with that before you move on is better than having it drop out of cache and be reloaded repeatedly. — btilly, Mar 18 '21 at 17:18
@btilly - A typical X86 processor has an 8 way associative cache. A bottom up 2 way merge sort uses 2 of those cache lines for the 2 inputs, and 1 of those cache lines for output. Bottom up merge sort is slightly faster because it isn't pushing and popping pairs of indexes to|from the stack, and for bottom up the indexes will be kept in registers, not memory. A 4 way bottom up merge sort with nested if's rather than a minheap is about 15% faster than 2 way. — rcgldr, Mar 18 '21 at 17:50
As posted in my answer, most libraries use a variation of insertion sort and bottom up merge sort for a stable sort, for example, C++ std::stable_sort(). I'm not aware of any library that uses top down merge sort for arrays or vectors. Top down merge sort is mostly used for educational purposes, not for actual usage. Auxiliary space can be reduced to just local variables, but such implementations are about 50% slower than conventional merge sort, and these are mostly research projects, not intended for actual usage. — rcgldr, Mar 19 '21 at 16:10
Visual Studio C++ std::stable_sort() uses a hybrid insertion + bottom up merge sort. Java arrays.sort() uses a variation of hybrid insertion + bottom up merge sort known as [timsort](https://en.wikipedia.org/wiki/Timsort). — rcgldr, Mar 19 '21 at 17:57
rcgldr - timsort is a nataural mergesort with a boost of 15 elements if the number of elements selected is less than that and as you say it does insertion sort on that. — SJHowe, Nov 08 '21 at 23:02

score 3 · Accepted Answer · answered Mar 18 '21 at 17:22

3

If you think that it has O(1) space complexity, look again. They have the original array A of size n, and an auxiliary temp also of size n. (It actually only needs to be n/2 but they kept it simple.)

And the reason why they need that second array is that when you merge, you copy the bottom region out to temp, then merge back starting with where it was.

So the tradeoff is this. A recursive solution involves a lot less fiddly bits and makes the concepts clearer, but adds a O(log(n)) memory overhead on top of the O(n) memory overhead that both solutions share. When you're trying to communicate concepts, that's a straight win.

Furthermore in practice I believe that recursive is also a win.

In the iterative approach you keep making full passes through your entire array. Which, in the case of a large array, means that data comes into the cache for a pass, gets manipulated, and then falls out as you load the rest of the array. Only to have to be loaded again for the next pass.

In the recursive approach, by contrast, for the operations that are the equivalent of the first few passes you load them into cache, completely sort them, then move on. (How many passes you get this win for depends heavily on data type, memory layout, and the size of your CPU cache.) You are only loading/unloading from cache when you're merging too much data to fit into cache. Algorithms courses generally omit such low-level details, but they matter a lot to real-world performance.

answered Mar 18 '21 at 17:22

btilly

43,296
3
59
88

Most library implementations will use insertion sort to create small sorted runs (32 to 256), which eliminates the first few passes, then switch to bottom up merge sort. The same idea can be used for recursive top down merge sort, but the cost of pushing + popping pointers and indexes to|from the stack ends up making recursive top down merge sort a tiny bit slower that iterative bottom up merge sort. – rcgldr Mar 18 '21 at 18:28
I've benchmarked naive top down and bottom up merge sort on my system, and bottom up is slightly faster. Any advantage to cached data is offset by stack operations, versus global optimizations which tend to keep pointers and indexes in registers. – rcgldr Mar 18 '21 at 18:49
@rcgldr This will be highly dependent on your language, data types, and size of data. How big a data set did you sort, what language was it written in, and what data type? – btilly Mar 18 '21 at 20:40
1

Most of my comparisons were done sorting 2^24 = 16777216 pseudo random 64 bit unsigned integers. I tested using C and C++, on Windows 7 or 10 64 bit, using Visual Studio 2015 or 2019. Top down takes about 1.64 seconds, bottom up about 1.57 seconds. For hybrid insertion + merge sort, top down 1.45 seconds, bottom up 1.41 seconds. Faster still is 4 way (no minheap) bottom up merge sort, which takes 1.36 seconds, and I think hybrid insertion + 4 way would be about 1.26 seconds. – rcgldr Mar 18 '21 at 21:16
Since Java doesn't optimize pointers and indexes into registers, with Java top down would probably be a bit faster than bottom up, but if time is an important factor, I wouldn't be using Java, and I would be using a hybrid insertion + 4 way merge sort. – rcgldr Mar 18 '21 at 21:16
@rcgldr Interesting. My experiences some years ago was with large enough data sets that I was forced to code custom external sorts. (I was doing a couple of other things with the merge so it needed to be custom.) And the number of passes to disk had an overwhelming impact. I have done other experiments since with data in the mere hundreds of MB, but didn't look at questions like what went into registers. – btilly Mar 18 '21 at 21:22
But one answer on stackoverflow claims to have iterative Merge Sort on `list data structure` having auxiliary space complexity. Reading all these answers I came up with the question. Link: https://stackoverflow.com/a/43560935/12064109 – Akash Dahane Mar 19 '21 at 02:42
1

@AkashDahane That answer did it with linked lists, not arrays. Yes, if you have the `O(n)` overhead of all of the pointers that are part of a linked list data structure, it is possible to sort it with only `O(1)` more of overhead. But if you start with an array then you need `O(n)`. (And the `O(n)` for the array has a smaller constant than the `O(n)` to have a linked list representation.) Also note that walking through a linked list of integers will perform far worse than an array of integers. – btilly Mar 19 '21 at 03:29
@AkashDahane - When Visual Studio std::list::sort switched from bottom up using an internal array of std::list to using iterators to avoid non-default allocation issues, they also switched to top down, which wasn't needed, but at the time I assumed the switch was made due to some unknown issue with bottom up. However, when I later investigated this on my own, I had no issued implementing bottom up, with the only "clever" part of using std::list::end() as an iterator to an empty list. Example code can be found in [my answer](https://stackoverflow.com/questions/40622430/40629882#40629882). – rcgldr Mar 19 '21 at 03:32
@btilly - getting back to the original question about sorting arrays or vectors, again I am not aware of any library that uses recursive top down merge sort. Visual Studio and other C++ compilers std::stable_sort() use a hybrid insertion + bottom up merge sort. Java arrays.sort() uses a compiled variation of hybrid insertion + bottom up merge sort called [timsort](https://en.wikipedia.org/wiki/Timsort). – rcgldr Mar 21 '21 at 16:45

trincot · Answer 2 · 2021-03-19T08:48:35.643

1

Found out that there exists Iterative version of Merge Sort algorithm with same time complexity but even better O(1) space complexity

The iterative, bottom-up implementation of Merge Sort you linked to, doesn't have O(1) space complexity. It maintains a copy of the array, so this represents O(n) space complexity. By consequence that makes the additional O(logn) stack space (for the recursive implementation) irrelevant for the total space complexity.

In the title of your question, and in some comments, you use the words "auxiliary space complexity". This is what we usually mean with space complexity, but you seem to suggest this term means constant space complexity. This is not true. "Auxiliary" refers to the space other than the space used by the input. This term tells us nothing about the actual complexity.

edited Mar 19 '21 at 08:48

answered Mar 18 '21 at 17:18

trincot

317,000
35
244
286

So for recursive Merge Sort algorithm, space complexity is `O(n +logn)` i.e `O(n)` and for iterative one it's simply `O(n)`? – Akash Dahane Mar 18 '21 at 18:27
Yes, so both have the same space complexity. – trincot Mar 18 '21 at 19:32
And what if we use this algorithm on list instead of array? Does it make any difference in Space complexity? – Akash Dahane Mar 19 '21 at 02:35
This algorithm would obviously need to look different, as in a linked list you don't have random access like in an array. If you have a new question, then please use the Ask Question button. – trincot Mar 19 '21 at 08:22

rcgldr · Answer 3 · 2021-03-20T21:16:47.810

Recursive top down merge sort is mostly educational. Most actual libraries use some variation of a hybrid insertion sort and bottom up merge sort, using insertion sort to create small sorted runs that will be merged in an even number of merge passes, so that merging back and forth between original and temp array ends up with the sorted data in the original array (no copy operation in merge other than singleton runs at the end of an array, which can be avoided by choosing an appropriate initial run size for insertion sort (note - this is not done in my example code, I only use run size 32 or 64, while a more advanced method like Timsort does choose an appropriate run size).

Bottom up is slightly faster because the array pointers and indexes will be kept in registers (assuming an optimizing compiler), while top down is pushing|popping array pointers and indexes to|from the stack.

Although I'm not sure that the OP actually meant O(1) space complexity for a merge sort, it is possible, but it is about 50% slower than conventional merge sort with O(n) auxiliary space. It's mostly an research (educational) effort now. The code is fairly complex. Link to example code. One of the options is no extra buffer at all. The benchmark table is for a relatively small number of keys (max is 32767 keys). For a large number of keys, this example ends up about 50% slower than an optimized insertion + bottom up merge sort (std::stable_sort is generalized, such as using a pointer to function for every compare, so it is not fully optimized).

https://github.com/Mrrl/GrailSort

Example hybrid insertion + bottom up merge sort C++ code (left out the prototypes):

void MergeSort(int a[], size_t n)           // entry function
{
    if(n < 2)                               // if size < 2 return
        return;
    int *b = new int[n];
    MergeSort(a, b, n);
    delete[] b;
}

void MergeSort(int a[], int b[], size_t n)
{
size_t s;                                   // run size 
    s = ((GetPassCount(n) & 1) != 0) ? 32 : 64;
    {                                       // insertion sort
        size_t l, r;
        size_t i, j;
        int t;
        for (l = 0; l < n; l = r) {
            r = l + s;
            if (r > n)r = n;
            l--;
            for (j = l + 2; j < r; j++) {
                t = a[j];
                i = j-1;
                while(i != l && a[i] > t){
                    a[i+1] = a[i];
                    i--;
                }
                a[i+1] = t;
            }
        }
    }

    while(s < n){                           // while not done
        size_t ee = 0;                      // reset end index
        size_t ll;
        size_t rr;
        while(ee < n){                      // merge pairs of runs
            ll = ee;                        // ll = start of left  run
            rr = ll+s;                      // rr = start of right run
            if(rr >= n){                    // if only left run
                rr = n;                     //   copy left run
                while(ll < rr){
                    b[ll] = a[ll];
                    ll++;
                }
                break;                      //   end of pass
            }
            ee = rr+s;                      // ee = end of right run
            if(ee > n)
                ee = n;
            Merge(a, b, ll, rr, ee);
        }
        std::swap(a, b);                    // swap a and b
        s <<= 1;                            // double the run size
    }
}

void Merge(int a[], int b[], size_t ll, size_t rr, size_t ee)
{
    size_t o = ll;                          // b[]       index
    size_t l = ll;                          // a[] left  index
    size_t r = rr;                          // a[] right index
    while(1){                               // merge data
        if(a[l] <= a[r]){                   // if a[l] <= a[r]
            b[o++] = a[l++];                //   copy a[l]
            if(l < rr)                      //   if not end of left run
                continue;                   //     continue (back to while)
            while(r < ee)                   //   else copy rest of right run
                b[o++] = a[r++];
            break;                          //     and return
        } else {                            // else a[l] > a[r]
            b[o++] = a[r++];                //   copy a[r]
            if(r < ee)                      //   if not end of right run
                continue;                   //     continue (back to while)
            while(l < rr)                   //   else copy rest of left run
                b[o++] = a[l++];
            break;                          //     and return
        }
    }
}

size_t GetPassCount(size_t n)               // return # passes
{
    size_t i = 0;
    for(size_t s = 1; s < n; s <<= 1)
        i += 1;
    return(i);
}

Why is recursive Merge Sort preferred over iterative Merge Sort even though the latter has auxillary space complexity?

3 Answers3