4

I am trying to understand the 3-way radix Quicksort, and i dont understand why the the CUTOFF variable there? and the insertion method?

public class Quick3string {

    private static final int CUTOFF =  15;   // cutoff to insertion sort

    // sort the array a[] of strings
    public static void sort(String[] a) {
        // StdRandom.shuffle(a);
        sort(a, 0, a.length-1, 0);
        assert isSorted(a);
    }

    // return the dth character of s, -1 if d = length of s
    private static int charAt(String s, int d) { 
        assert d >= 0 && d <= s.length();
        if (d == s.length()) return -1;
        return s.charAt(d);
    }


    // 3-way string quicksort a[lo..hi] starting at dth character
    private static void sort(String[] a, int lo, int hi, int d) { 

        // cutoff to insertion sort for small subarrays
        if (hi <= lo + CUTOFF) {
            insertion(a, lo, hi, d);
            return;
        }

        int lt = lo, gt = hi;
        int v = charAt(a[lo], d);
        int i = lo + 1;
        while (i <= gt) {
            int t = charAt(a[i], d);
            if      (t < v) exch(a, lt++, i++);
            else if (t > v) exch(a, i, gt--);
            else              i++;
        }

        // a[lo..lt-1] < v = a[lt..gt] < a[gt+1..hi]. 
        sort(a, lo, lt-1, d);
        if (v >= 0) sort(a, lt, gt, d+1);
        sort(a, gt+1, hi, d);
    }

    // sort from a[lo] to a[hi], starting at the dth character
    private static void insertion(String[] a, int lo, int hi, int d) {
        for (int i = lo; i <= hi; i++)
            for (int j = i; j > lo && less(a[j], a[j-1], d); j--)
                exch(a, j, j-1);
    }

    // exchange a[i] and a[j]
    private static void exch(String[] a, int i, int j) {
        String temp = a[i];
        a[i] = a[j];
        a[j] = temp;
    }

    // is v less than w, starting at character d
    private static boolean less(String v, String w, int d) {
        assert v.substring(0, d).equals(w.substring(0, d));
        return v.substring(d).compareTo(w.substring(d)) < 0; 
    }


    // is the array sorted
    private static boolean isSorted(String[] a) {
        for (int i = 1; i < a.length; i++)
            if (a[i].compareTo(a[i-1]) < 0) return false;
        return true;
    }


    public static void main(String[] args) {

        // read in the strings from standard input
        String[] a = StdIn.readAll().split("\\s+");
        int N = a.length;

        // sort the strings
        sort(a);

        // print the results
        for (int i = 0; i < N; i++)
            StdOut.println(a[i]);
    }
}

from http://www.cs.princeton.edu/algs4/51radix/Quick3string.java.html

Nikita Rybak
  • 67,365
  • 22
  • 157
  • 181
Peiska
  • 1,789
  • 4
  • 17
  • 15
  • @peiska: not an answer (hence the comment)... How cute the "optimization" from the eighties ;) In this and age of multi-cores CPUs, real optimization is obtained through parallelization. I wrote my own correctly Java multi-threaded quicksort (now in production on hundreds of systems) and *that* is an optimization :) I talked a bit about here: http://stackoverflow.com/questions/2210185 Seen that you're asking about quicksort, I think it would be worth mentionning that the fastest quicksort (for real amount of data) nowadays are **definitely** the multithreaded ones. – SyntaxT3rr0r Jun 10 '10 at 18:20
  • Note that the value of CUTOFF is most likely pulled out of a hat. – Thorbjørn Ravn Andersen Jun 10 '10 at 18:34
  • @ThorbjørnRavnAndersen Don Knuth's hat, actually, with a proof. – user207421 Oct 17 '13 at 22:35

4 Answers4

8

It would appear to be used in order to invoke insertion sort for sufficiently small (size <= 15) arrays. This is most likely to speed up sorting.

Jan Gorzny
  • 1,732
  • 1
  • 16
  • 25
  • 1
    Ah, you _just_ beat me to it. Insertion sort "is relatively efficient for small lists and mostly-sorted lists," ( [Wikipedia](http://en.wikipedia.org/wiki/Sorting_algorithm#Insertion_sort) ) so this would be a reasonable optimization to "normal" three-way quicksort. The cutoff is a variable just so you can tweak the place where you want to switch to insertion sort. +1! – Pops Jun 10 '10 at 18:02
  • Insertion sort is O(N^2) and quicksort is on average O(N log N), but those are asymptotic results (for large N). At some low N there is crossover from insertion sort being better to quicksort being better. The crossover point is implementation and system dependent. – Justin Peel Jun 10 '10 at 18:16
  • @peiska, if this answer solved your problem (and it sounds like it did), you can click the hollow checkmark under the voting arrows to make it the accepted answer. That gives the answerer an extra 15 rep and keeps this question off the "Unanswered" list. And you get two rep for it too! – Pops Jun 10 '10 at 19:25
1

The sort method above is a recursive method. And every recursive method should have some kind of base case (otherwise it will keep calling itself, eventually leading to a stack overflow).

The insertion part is the base case in that method, because at every recursive step, the hi-lo difference keeps decreasing, & when its less than CUTOFF, the insertion sort will eventually be triggered, forcing the recursion to stop.

if (hi <= lo + CUTOFF) {       // base case
    insertion(a, lo, hi, d);
    return;
}

Now, why insertion ? Because it works well for small arrays. More on sorting here: http://www.sorting-algorithms.com/

Ashok Bijoy Debnath
  • 1,493
  • 1
  • 17
  • 27
1

This idea comes from Robert Sedgewick, who knows more about Quicksort than probably any man alive. It is cited in Donald E. Knuth, The Art of Computer Programming, Volume III, where he shows that for small M, insertion sort is faster than Quicksort, so he recommends not sorting small partitions < M at all and leaving it to one final insertion sort over the whole data set at the end. Knuth gives a function for calculating M (which is your CUTOFF), and which is 9 for his MIX pseudo-computer.

user207421
  • 305,947
  • 44
  • 307
  • 483
1

It's a simple optimization of quicksort algorithm. The cost of recursive calls in quicksort are quite high, so for small arrays insertion sort works better. So, the idea is, that if length of subarray to be sorted os below certain threshold, it's better to sort it using insertion sort than quicksort. In your example, CUTOFF variable defines that threshold, i.e. if less than 15 elements are left, they are sorted using insertion sort instead of quicksort.