1

As a first attempt to get into merge sort i produced the following code which works on strings because they are easier than lists to deal with.

class Program
{
    static int iterations = 0;
    static void Main(string[] args)
    {            
        string test = "zvutsrqponmlihgfedcba";
        test = MergeSort(test);
        // test is sorted after 41 iterations
    }

    static string MergeSort(string input)
    {
        iterations++;
        if (input.Length < 2)
            return input;
        int pivot = 0;
        foreach (char c in input)
            pivot += c;
        pivot /= input.Length;
        string left = "";
        string right = "";
        foreach (char c in input)            
            if (c <= (char)pivot)
                left += c;
            else
                right += c;            
        return string.Concat(new string[] { MergeSort(left), MergeSort(right) });
    }
}

Reading on Wikipedia about possible optimizations i found the following hint "To make sure at most O(log N) space is used, recurse first into the smaller half of the array, and use a tail call to recurse into the other." but honestly i have no idea how to apply this to my case. I have some vague memories of tail calls from my IT class when we were taught about recursion and factorials but i really can't understand how to apply Wikipedia's advice to my piece of code.

Any help would be much appreciated.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
user1909612
  • 273
  • 5
  • 14
  • [C# doesn't support tail-call optimization.](http://stackoverflow.com/questions/491376/why-doesnt-net-c-optimize-for-tail-call-recursion) – Robert Harvey May 23 '13 at 17:57
  • @RobertHarvey Dangit, I was just about to post that. – cdhowie May 23 '13 at 17:57
  • @RobertHarvey: Well, it's complicated; the x64 jitter will generate tail calls in some circumstances even though the C# compiler never produces a tail call instruction, but this isn't one of them. But this question is about "manually" rewriting a program to be tail recursive. – Eric Lippert May 23 '13 at 17:59
  • Oh ok thanks for the answer, i didn' t know about that. I suppose i should' ve informed myself before asking but wikipedia made me panic. Also thanks @torrential coding for editing my question to a more appropriate format. – user1909612 May 23 '13 at 18:02
  • 2
    This isn't mergesort; this is quicksort. – Eric Lippert May 23 '13 at 18:10
  • In fact, I am no longer sure if it is a correct implementation at all. QuickSort is an in-place sorting algorithm, and it involves swapping elements in the end. I do not think Concat will quite do that trick... – AlexK May 23 '13 at 19:06
  • @AlexK: QuickSort does not have to be in-place, though it typically is. The QuickSort algorithm is: choose a pivot, divide the list into two lists where the elements of the first list are all smaller than the pivot and the elements of the second are all greater than or equal to the pivot. Sort both sublists recursively. Concatenate the results. You typically do it in-place because if you do it externally like this, you end up allocating roughly O(n lg n) extra space. Which is why the original poster's concern about bounding the stack depth is a bit odd. – Eric Lippert May 23 '13 at 21:02
  • @Eric: I stand corrected. Looks like it is going to work =) – AlexK May 23 '13 at 21:48

2 Answers2

8

There are numerous problems with this question, starting with the fact that you've implemented a very slow version of QuickSort but asked a question about MergeSort. MergeSort is not typically implemented as a tail recursive algorithm.

Let me ask a better question on your behalf:

How do I turn a recursive algorithm into a tail-recursive algorithm?

Let me sketch out a simpler tail-recursive transformation and then you can work out how to apply that to your sort, should you decide that doing so is a good idea.

Suppose you have the following recursive algorithm:

static int Count(Tree tree)
{
    if (tree.IsEmpty) 
        return 0;
    return 1 + Count(tree.Left) + Count(tree.Right);
}

Let's break that down into more steps using the following somewhat bizarre transformation:

static int Count(Tree tree)
{
    int total = 0;
    Tree current = tree;
    if (current.IsEmpty) 
        return 0;
    total += 1;
    int countLeft = Count(current.Left);
    total += countLeft;
    current = current.Right;
    int countRight = Count(current);
    total += countRight;
    return total;
}

Notice that this is exactly the same program as before, just more verbose. Of course you would not write the program in such a verbose manner, but it will help us make it tail recursive.

The point of tail recursion is to turn a recursive call into a goto. We can do that like this:

static int Count(Tree tree)
{
    int total = 0;
    Tree current = tree;

 Restart:

    if (current.IsEmpty) 
        return total;
    int countLeft = Count(current.Left);
    total += 1;
    total += countLeft;
    current = current.Right;
    goto Restart;
}

See what we've done there? Instead of recursing, we reset the current reference to the thing that would have been recursed on, and go back to the start, while maintaining the state of the accumulator.

Now is it clear how to do the same thing to the QuickSort algorithm?

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • Thank you very much, even if my language of choice doesn' t support tail recursion your answer still helped me a lot to understand it better. I will wait just a bit before marking the asnwer as accepted because i don' t want to bee too hasty. – user1909612 May 23 '13 at 18:18
  • Of course, if this was real code, using `while` would be a better options that `goto`. – svick May 23 '13 at 19:22
  • @Eric Lippert: Can the Microsft C# Compiler convert the `Count()` from example #1, to #3? – Jack May 23 '13 at 20:06
  • 1
    @svick: I see very little difference between `while(true) { s; }` and `L: s; goto L;`. The point of the `goto` is to illustrate what a tail call really is; hiding the `goto` in a `while` seems more confusing than illuminating. Of course were I to write this program using better style it would be something like `int total = 0; for(Tree current = tree; !current.IsEmpty; current = current.Right) total += 1 + Count(current.Left); return total;` This isn't intended to be good style, it's intended to make the tail call transformation clear. – Eric Lippert May 23 '13 at 20:57
  • 1
    @Jack: No. The C# compiler does no tail call optimizations at all. The x64 jitter will sometimes perform tail call optimizations but this kind of transformation is out of its league. – Eric Lippert May 23 '13 at 20:58
  • I mean *Microsoft bad time to make a mistake. Thanks very much. So,most part of optmizations are in the jitter, instead of compiler? but if so,why some peoples have on mind that a C# program is fastest than VB.NET? – Jack May 23 '13 at 22:52
  • @Jack: You'll have to ask those people why they have that belief. I can't read minds; I don't know why people believe the things they believe. – Eric Lippert May 23 '13 at 22:56
  • @Jack: If you are interested to know what optimizations the C# compiler performs, see my article on that subject: http://blogs.msdn.com/b/ericlippert/archive/2009/06/11/what-does-the-optimize-switch-do.aspx – Eric Lippert May 23 '13 at 22:57
  • I'm sorry if the last question isn't a real question or stupid. I'll check out the link. Thanks. – Jack May 24 '13 at 00:24
3

This looks like a less-than-optimal variant of QuickSort, not a MergeSort. You are missing a C# equivalent of this part:

function merge(left, right)
    var list result
    while length(left) > 0 or length(right) > 0
        if length(left) > 0 and length(right) > 0
            if first(left) <= first(right)
                append first(left) to result
                left = rest(left)
            else
                append first(right) to result
                right = rest(right)
        else if length(left) > 0
            append first(left) to result
            left = rest(left)
        else if length(right) > 0
            append first(right) to result
            right = rest(right)
    end while
    return result
AlexK
  • 1,279
  • 8
  • 19
  • I am studying your answer, by giving it a brief look i guess i got the theory wrong. – user1909612 May 23 '13 at 18:06
  • Ok correct me if i am wrong but isn' t it the same thing to recursively call string.concat ? Why is it more optimal this other way ? – user1909612 May 23 '13 at 18:15
  • Calling string.Concat just appends one string to another. Merge acts differently. Let's say a ="1357" and b ="468". Concat(a,b) returns "1357468" . merge(a,b) returns "1345678". – AlexK May 23 '13 at 18:22
  • If any consolation, Quicksort is faster than Mergesort in a long run =) If you really want Mergesort though, please study this pseudocode carefully:http://en.wikipedia.org/wiki/Merge_sort – AlexK May 23 '13 at 18:23
  • Well, QuickSort is *on average* faster than MergeSort. But the nice thing about MergeSort is that it is *guaranteed* to be O(n lg n) in time. A naively-implemented QuickSort will typically be faster, but can be as slow as O(n squared) in some cases. – Eric Lippert May 23 '13 at 21:05