Parallel sort slower than serial sort

Question

I was reading about the new features in Java 8 and one of them was the new Arrays.parallelSort() method. I made some tests sorting an array of doubles and one of Strings and for Strings the parallelSort was much slower.

Here is the content of a test method for Strings:

    final int size = 10000;
    final String[] values1 = new String[size];
    final String[] values2 = new String[size];
    for (int i = 0; i < size; i++) {
        values1[i] = Integer.toString(i);
        values2[i] = values1[i];
    }
    Collections.shuffle(Arrays.asList(values1));
    Collections.shuffle(Arrays.asList(values2));
    final Comparator<String> comparator = (o1, o2) -> o2.compareTo(o1);

    long startTimeInNano = System.nanoTime();
    Arrays.sort(values1, comparator);
    long endTimeInNano = System.nanoTime();
    System.out.println("Arrays.sort: totalTimeInMicro= " + ((endTimeInNano - startTimeInNano)/1000));

    //parallel sort with java 8
    startTimeInNano = System.nanoTime();
    Arrays.parallelSort(values2,comparator);
    endTimeInNano = System.nanoTime();
    System.out.println("Arrays.parallelSort: totalTimeInMicro= " + ((endTimeInNano - startTimeInNano)/1000));

The result was:

Arrays.sort: totalTimeInMicro= 11993

Arrays.parallelSort: totalTimeInMicro= 89823

I also tried this code on another computer and the result was the same (25608 vs 808660). The computer I run the tests has an i5-2500 CPU. Do you have any idea why I get this kind of results?

Could be due to thread creation overhead. Try sorting even larger arrays: it might be possible that there is an array size for which parallel sort is faster. — juhist, Mar 24 '15 at 13:43
Timing a single invocation (without even ramping up) isn't going to tell you much. — biziclop, Mar 24 '15 at 13:44
1. You should give a warm-up run before doing any micro bench-marking. 2. `Arrays.parallelSort()` uses `fork-join` framework. So it is directly related to the number of cores on a system (hence it is *architecture-dependent*). i-5 has 4 cores, so ideally `parallel sort` should be faster. — TheLostMind, Mar 24 '15 at 13:48
@Elemental The list isn't in the correct order though, as the elements are compared as strings, so `"1000"` < `"2"` — biziclop, Mar 24 '15 at 13:48
[This](http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) may be of help in writing a more informative benchmark. — biziclop, Mar 24 '15 at 13:51
I am running a client JVM and it seems that after 200k values, the parallelSort starts to run faster. — Slimu, Mar 24 '15 at 13:58
@Slimu Or maybe that's where JIT kicks in, or who knows. Reducing the noise of the measurement is tricky. — biziclop, Mar 24 '15 at 13:59

score 7 · Accepted Answer · edited May 23 '17 at 12:24

This benchmark tells you hardly anything. The most important things for a microbenchmark are

Give the JIT a chance to optimize the code by running the tests multiple times
Use different input sizes
Print some of the results, to prevent the JIT from optimizing away the whole calls

There are some more points to consider - in fact, many more points. You should consult How do I write a correct micro-benchmark in Java? for further information.

For really "profound" information, you should use tools like Caliper or JMH. But even with little effort, one can create a microbenchmark that shows a rough indication of how the actual performance would be. So one of the simplest forms of a microbenchmark could look like this:

import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;

public class ParallelSortSpeedTest
{
    public static void main(String[] args)
    {
        for (int size=100000; size<=1000000; size+=100000)
        {
            final String[] values1 = new String[size];
            final String[] values2 = new String[size];
            for (int i = 0; i < size; i++) {
                values1[i] = Integer.toString(i);
                values2[i] = values1[i];
            }
            Collections.shuffle(Arrays.asList(values1));
            Collections.shuffle(Arrays.asList(values2));
            final Comparator<String> comparator = (o1, o2) -> o2.compareTo(o1);

            testSort(values1, comparator);
            testParallelSort(values2, comparator);
        }
    }

    private static void testSort(
        String array[], final Comparator<String> comparator)
    {
        long startTimeInNano = System.nanoTime();
        Arrays.sort(array, comparator);
        long endTimeInNano = System.nanoTime();
        System.out.println("Arrays.sort        : totalTimeInMicro= " + 
            ((endTimeInNano - startTimeInNano)/1000)+", first "+array[0]);
    }

    private static void testParallelSort(
        String array[], final Comparator<String> comparator)
    {
        long startTimeInNano = System.nanoTime();
        Arrays.parallelSort(array, comparator);
        long endTimeInNano = System.nanoTime();
        System.out.println("Arrays.parallelSort: totalTimeInMicro= " + 
            ((endTimeInNano - startTimeInNano)/1000)+", first "+array[0]);
    }

}

This is a reasonable option, considering the trade-off between the effort of getting a JMH benchmark up and running, and the reliability of the results. This test will print something like

...
Arrays.sort        : totalTimeInMicro= 530669, first 999999
Arrays.parallelSort: totalTimeInMicro= 158227, first 999999

Showing that the parallel sort should be faster, at least.

I made some more tests, using an initial size of 50k and increasing the size by 50k up to 10 millions. For the final size, I got the following result: `Arrays.sort: totalTimeInMicro= 653260 Arrays.parallelSort: totalTimeInMicro= 257168` — Slimu, Mar 24 '15 at 14:07
I made the mistake of considering single runs with different values good enough for a simple test. I'll read more about micro-benchmarking. — Slimu, Mar 24 '15 at 14:10
Genuine question: Shouldn't you only Shuffle once, and then let the other array be a copy of that shuffle? So that you are comparing the exact same thing. Upvoted. — Peheje, Jan 02 '17 at 23:13
@Peheje Basically, you are right. I could try to defend me: This part of the code was basically taken from the question. *Theoretically*, it *could* happen that one shuffle causes an order of the strings that is "easier to sort" (e.g. alrady in ascending order). But considering that there are 10 runs with up to 1 million strings, and the `shuffle` generates a truly (pseudo)random order, I'm pretty sure that differences will be not measurable here. However, one could shuffle the first list, and then use `values2 = values1.clone();` to be on the safe side - or do a *real* benchmark with JMH ;-) — Marco13, Jan 02 '17 at 23:23

Parallel sort slower than serial sort

1 Answers1