6

I'm reading chapter about sorting in Sedgewick's "Algorithms". Along the way I wrote 3 basic algorithms for sorting: selection, insertion and shell sort. The book says, that, despite the fact that all three have quadratic worst case complexity, shell sort should be much faster than insertion sort on random data. In the book they get 600x performance boost.

But I get following multipliers (almost do not change with increasing of array size) on my laptop:

  • selection: 5.5x
  • insertion: 1x
  • shell: 1.8x !

The question that bothers me is - why shell sort is almost two times slower, than insertion sort?!

I guess, something is wrong with my shellsort implementation. But I almost copied it from the book:

class ShellSort extends Sort {
    //precalculate sequence: 1, 4, 13, 40, 121, 364, 1093, ...
    //(3^20 - 1)/2 is enough for any array I sort
    private static final int[] SEQUENCE = new int[20];
    static {
        for(int i = 1; i <= SEQUENCE.length; i++)
            SEQUENCE[i - 1] = (int)(Math.pow(3, i) - 1) / 2;
    }

    public void sort(int[] a) {
        int length = a.length;

        int seqLen = SEQUENCE.length;
        int nth;
        int j;

        for(int seqi = seqLen - 1; seqi >= 0; seqi--) {
            if(SEQUENCE[seqi] < length / 3) {
                nth = SEQUENCE[seqi];
                for(int n = 0; n < length; n+=nth) {
                    j = n;
                    while(j > 0 && a[j] < a[j - nth]) {
                        exch(a, j, j-nth);
                        j -= nth;
                    }
                }
            }
        }
    }
}

Rest of the code for those, who would like to run test on their machine (doubling array size test with JVM heat up has no significant effect on the results, so this simple test is good enough for N > ~ 200 000).

main:

int N = 500_000;
Random rand = new Random();
int[] a = new int[N];
for(int i = 0; i < N; i++)
    a[i] = rand.nextInt();

//insertion sort
int[] aCopy = Arrays.copyOf(a, a.length);
long start = System.nanoTime();
new InsertionSort().sort(aCopy);
System.out.println("insert:\t" + (System.nanoTime() - start));

//shell sort
aCopy = Arrays.copyOf(a, a.length);
start = System.nanoTime();
new ShellSort().sort(aCopy);
System.out.println("shell:\t" + (System.nanoTime() - start));

InsertionSort and Sort classes:

class InsertionSort extends Sort {
    public void sort(int[] a) {
        int length = a.length;
        int j;
        int x;
        for(int i = 1; i < length; i++) {
            j = i;
            x = a[i];
            while(j > 0 && x < a[j-1]) {
                a[j] = a[--j];
            }
            a[j] = x;
        }
    }
}
abstract class Sort {
    abstract public void sort(int[] a);

    protected static final void exch(int[] a, int i, int j) {
        int t = a[i];
        a[i] = a[j];
        a[j] = t;
    }
}
templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
Oroboros102
  • 2,214
  • 1
  • 27
  • 41
  • 2
    It would be a good idea to follow the guidelines here - [How do I write a correct micro-benchmark in Java?](http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) - before putting too much weight into your results. – Bernhard Barker Feb 27 '14 at 07:22
  • 1
    And never try to explain empirical results before you prove they are **[statistically significant](http://en.wikipedia.org/wiki/Statistical_significance)**. – amit Feb 27 '14 at 07:29
  • I ran all that code in more sophisticated benchmark with jvm "heat up" and got same results. The reason is - inner loops quickly get JITed and no significant GC overhead introduced. – Oroboros102 Feb 27 '14 at 07:30
  • @amit To make tests more statistically significant I ran test using exponential array size growth (N = 2^k, k = 1, 2, 3 ...). Same results. – Oroboros102 Feb 27 '14 at 07:31
  • Not sure it impacts much the results, but the book says *100,000 random Double* not *int*. Also, to ensure no optimization is performed, try to program the algo in C, using *gcc* without optimizations. – Déjà vu Feb 27 '14 at 07:53
  • @ring0 The more - the better (more precise multiplier). I'll keep rewriting in C as my last option. – Oroboros102 Feb 27 '14 at 07:56

4 Answers4

3

From a quick glance you can see that shell sort looks slower by having more loops. Brute force, you can put a system.out.println in the innermost loop to see how many comparisons are made.

3 Loops of shellsort

  • for(int seqi = seqLen - 1; seqi >= 0; seqi--)
  • for(int n = 0; n < length; n+=nth)
  • while(j > 0 && a[j] < a[j - nth])

2 Loops of insertion

  • for(int i = 1; i < length; i++)
  • while(j > 0 && x < a[j-1])
jcalloway
  • 1,155
  • 1
  • 15
  • 24
  • Shell sort has ~0.8 of Insertion sort loops. But still slower and not 600x (like in the book). – Oroboros102 Feb 27 '14 at 07:27
  • Self fix: "still slower but not 600x faster" – Oroboros102 Feb 27 '14 at 07:35
  • @Oroboros102 so another look at your code. I would probably not time the constructor calls. Not sure when the static initializers run. Also try timing the shellsort prior to the insertion sort. see if you still get same effects. – jcalloway Feb 27 '14 at 08:04
3

Your implementation is broken and outputs the sorted array only due to the fact that the last step is 1 and your two internal cycles perform the basic insertion sort when the step is 1. When the step is greater then 1, the two internal cycles in your implementation do anything but step-sort the array, so what you implementation does is it shuffles the array in all iterations of the outer cycle and then insertion-sorts it in the last iteration of the outer cycle. Of course it will take longer then just insertion-sort it once.

Reusing your sequence the proper shell sort implementation should look like this:

public void sort( int[] a ) {
    int length = a.length;

    int stepIndex = 0;
    while ( stepIndex < SEQUENCE.length - 1 && SEQUENCE[ stepIndex ] < length / 3 ) {
        stepIndex++;
    }

    while ( stepIndex >= 0 ) {
        int step = SEQUENCE[ stepIndex-- ];
        for ( int i = step; i < length; i++ ) { // DIFF: i++ instead of i+=step
            for ( int j = i; j >= step && a[ j ] < a[ j - step ]; j -= step ) {
                exch( a, j, j - step );
            }
        }
    }
}

Two main differences between this implementation and yours:

  • proper initial indexes for two internal cycles
  • proper index increment for middle cycle (+1 instead of +step in your code)

Also, check the http://algs4.cs.princeton.edu/21elementary/Shell.java.html for a good implementation and good step sequence.

Oleg Estekhin
  • 8,063
  • 5
  • 49
  • 52
0

I believe reason would be caching. Shell sort has lots of (kinda) random accesses so more cache misses. I believe it would give worse performance with newer hardwares. Insertion sort pretty much always works on same areas of memory so it performs better

  • Your hypothesis more looks like truth. How can I verify it? – Oroboros102 Feb 27 '14 at 07:33
  • 1
    http://stackoverflow.com/questions/10082517/simplest-tool-to-measure-c-program-cache-hit-miss-and-cpu-time-in-linux looks like `perf stat ./a.out` shoud work. But I am not sure if it would be as effective for a java app –  Feb 27 '14 at 07:38
  • Looks like cache misses are not the issue. Insertion sort: 161M references and 230K misses, Shell: 167M and 300K respectively. – Oroboros102 Feb 27 '14 at 07:51
  • More interesting is that shell sort has much more (~2x) "cycles" and "branches". That looks like algorithm problem, but I can't see it in my code. – Oroboros102 Feb 27 '14 at 07:54
0

Your implementation of Shellsort is not correct. You are not comparing each element with a given step in the array. You should also use insertion sort and shell sort on medium size arrays only to get the best out of them. You can use generics to remove all warnings. You can refer also to this increment sequence to get better results.

Here is the shell sort to do the same:

public class ShellSort<T> {

    //Knuth's function for shell sort
    //0(n^3/2) in practice much better
    public static List<Integer> shellSortIncrementSeqFunction(int length) {
        List<Integer> res = new ArrayList<>();
        for(int i = 0; i < length; i++)
                res.add(3 * i + 1);
        return res;
    }

    //Good function tough to compete
    private List<Integer> shellSortIncrementSeqFunctionGood(int length) {
        int argForFunction1 = 0;
        int argForFunction2 = 2;
        List<Integer> res = new ArrayList<>();
        while(true) {
            int val1 = shellSortArgFunction1(argForFunction1);
            int val2 = shellSortArgFunction2(argForFunction2);
            if(val1 < val2) {
                if(val1 >= length)
                    break;
                res.add(val1);
                argForFunction1++;
            } else {
                if(val2 >= length)
                    break;
                res.add(val2);
                argForFunction2++;
            }
        }
        return res;
    }

    private int shellSortArgFunction1(int arg) {
        return (int)(9 * Math.pow(4, arg) - 9 * Math.pow(2, arg) + 1);
    }

    private int shellSortArgFunction2(int arg) {
        return (int)(Math.pow(4, arg) - 3 * Math.pow(2, arg) + 1);
    }


    @SuppressWarnings("unchecked")
    private boolean less(Comparable<T> thisComparable, Comparable<T> thatComparable) {
        return thisComparable.compareTo((T) thatComparable) < 0;
    }

    private void exchange(Object[] comparableArray, int thisIndex, int thatIndex) {
        Object temp = comparableArray[thisIndex];
        comparableArray[thisIndex] = comparableArray[thatIndex];
        comparableArray[thatIndex] = temp;
    }

    public boolean isSorted(Comparable<T>[] comparableArray) {
        boolean res = true;
        int len = comparableArray.length;
        for(int i = 1; i < len; i++)
            res &= !less(comparableArray[i], comparableArray[i - 1]);
        return res;
    }

    public void shellSort(Comparable<T>[] comparableArray) {
        int len = comparableArray.length;
        List<Integer> incrementSequence = shellSortIncrementSeqFunctionGood(len);
        int seqLen = incrementSequence.size() - 1;
        for(int i = seqLen; i >= 0; i--) {
            int step = incrementSequence.get(i);
            //h-sorting the array
            for(int j = step; j < len; j++)
                for(int k = j; k >= step && less(comparableArray[k], comparableArray[k-step]); k -= step)
                    exchange(comparableArray, k, k-step);
        }
        assert isSorted(comparableArray);
    }

    private void show(Comparable<T>[] comparableArray) {
        Arrays.stream(comparableArray).forEach(x -> System.out.print(x + ", "));
        System.out.println();
    } 

}

You can view the source for this post here.

The Flash
  • 33
  • 5