5

I have the following code ,

import java.util.Arrays;


public class ParellelStream {

    public static void main(String args[]){
        Double dbl[] = new Double[1000000];
        for(int i=0; i<dbl.length;i++){
            dbl[i]=Math.random();
        }

        long start = System.currentTimeMillis();
        Arrays.parallelSort(dbl);
        System.out.println("time taken :"+((System.currentTimeMillis())-start));

    }

}

When I run this code it takes time approx 700 to 800 ms, but when I replace the line Arrays.parallelSort to Arrays.sort it takes 500 to 600 ms. I read about the Arrays.parallelSort and Arrays.sort method which says that Arrays.parellelSort gives poor performance when dataset are small but here I am using array of 1000000 elements. what could be the reason for parallelSort poor performance ?? I am using java8.

lbalazscs
  • 17,474
  • 7
  • 42
  • 50
Rakesh Chouhan
  • 1,196
  • 12
  • 28
  • Have you tried putting the timed sort in a loop and doing it multiple times in one invocation? Starting up the threads that do the parallel sorting has an overhead, but they will be reused during the program's lifetime. So for a real-world test (unless your use case is indeed a one-shot program) you should repeat the sorting often to amortize the overhead. – Sebastian Redl May 05 '15 at 13:12
  • 2
    How many processing cores are on your test machine? Is it relatively quiescent with only your test running? – pens-fan-69 May 05 '15 at 13:12
  • Yes I have tried in that way as well @SebastianRedl – Rakesh Chouhan May 05 '15 at 13:14
  • Another suggestion would be to increase your array size by an order of magnitude or two and see how it runs. – pens-fan-69 May 05 '15 at 13:14
  • I have Quad core processor on my test machine @pens-fan-69 – Rakesh Chouhan May 05 '15 at 13:15
  • 1
    Does this help: http://stackoverflow.com/questions/23170832/java-8s-streams-why-parallel-stream-is-slower – Rahul Tripathi May 05 '15 at 13:16
  • @RakeshChouhan Given that you have multiple processing cores and assuming that there are not competing processes, it may well be that 1000000 elements is indeed considered small. As Sebastian Redl suggested, spawning threads does carry overhead. – pens-fan-69 May 05 '15 at 13:17
  • @pens-fan-69 I tried with 2000000 elements but still it is showing same behavior. – Rakesh Chouhan May 05 '15 at 13:21
  • try with 100x more elements 2000000 is still small – bhspencer May 05 '15 at 13:24
  • It may be that it doesn't make a difference, but I would suggest trying a hundred million or even a billion. Right now the tests are only taking a few hundred milliseconds which really isn't very long. That said, I think @RahulTripathi has pointed you at a very good explanation of some of the idiosyncrasies of performance testing in the java environment. – pens-fan-69 May 05 '15 at 13:24
  • @pens-fan-69 It is showing some positive changes after 5000000 elements. – Rakesh Chouhan May 05 '15 at 13:26
  • Try with more elements. Increase by orders of magnitude. – bhspencer May 05 '15 at 13:29
  • 1
    Thank you very much to all of you, Now I am clear about the parallel vs sequential sort, will be use parallel sort when data set is too large, I tried with 5000000 to 10000000 elements and now I can see the difference. but still when I run it multiple time sometimes I get faster result with sequential sort. – Rakesh Chouhan May 05 '15 at 13:36
  • http://www.ibm.com/developerworks/java/library/j-jtp12214/ – Boann May 05 '15 at 13:46
  • In retrospect I think that the performance you are seeing might be because your initial order is random. I suspect that with a truly random starting condition there isn't much gain to be had by sorting the array in parallel. – bhspencer May 05 '15 at 14:33
  • 3
    You should try a legitimate benchmarking system, such as [Caliper](https://code.google.com/p/caliper/). – Kevin Welker May 05 '15 at 14:40
  • 1
    +1 for Kevin: running this method once is not going to get you accurate benchmarking results at all. The method that appears faster the first time you run it may be 10x slower after a proper warmup. This is yet another bad measurement question. – Louis Wasserman May 05 '15 at 15:42
  • @LouisWasserman While I agree it is better to use a real benchmark, if by warmup you refer to the JIT, this can be easily disabled with `-Djava.compiler=NONE` – Jean-François Savard May 05 '15 at 15:52
  • 2
    That will get results that resemble realistic application performance even less, not more. A warmed up JIT is, in almost all cases, the most realistic context for measurement. – Louis Wasserman May 05 '15 at 15:53

2 Answers2

5

The parallelSort function will use a thread for each cpu core you have on your machine. Specifically parallelSort runs tasks on the ForkJoin common thread pool. If you only have one core you would not see an improvement over single threaded sort.

If you only have multiple cores you are going to have some upfront cost associated with creating the new threads which will mean that for relatively small arrays you are not going to see linear performance gains.

The compare function for comparing doubles is not an expensive function. I think that in this case 1000000 elements can be safely considered small and the benefits of using multiple threads is outweighed by the upfront costs of creating those threads. Since the upfront costs will be fixed you should see a performance gain with larger arrays.

bhspencer
  • 13,086
  • 5
  • 35
  • 44
  • I have Quad core processor on Test Machine. – Rakesh Chouhan May 05 '15 at 13:18
  • Try with 100x more elements in your arrays. Since the upfront costs will be fixed. You should see a performance gain with larger arrays. – bhspencer May 05 '15 at 13:21
  • 2
    Also for benchmarking I recommend using System.nanoTime() rather than System.currentTimeMillis() as currentTimeMillis does not have millisecond accuracy on most machines. – bhspencer May 05 '15 at 13:22
  • 1
    Using System.nanoTime() is an excellent suggestion @bhspencer. – pens-fan-69 May 05 '15 at 13:25
  • 1
    This SO post says that parallel sort uses multiple threads http://stackoverflow.com/questions/17328077/difference-between-arrays-sort-and-arrays-parallelsort – bhspencer May 05 '15 at 14:23
  • It is more accurate to say that the parallelSort function does not directly create the threads its self. Those threads are created and managed by the ForkJoin common pool. I'll add that clarification to my answer. – bhspencer May 05 '15 at 14:28
4

I read about the Arrays.parallelSort and Arrays.sort method which says that Arrays.parellelSort gives poor performance when dataset are small but here I am using array of 1000000 elements.

This is not the only thing to take in consideration. It depends a lot on your machine (how your CPU handle multi-threading etc).

Here a quote from the Parallelism part of The Java Tutorials

Note that parallelism is not automatically faster than performing operations serially, although it can be if you have enough data and processor cores [...] it is still your responsibility to determine if your application is suitable for parallelism.

You might also want to have a look at the code of java.util.ArraysParallelSortHelpers for a better understanding of the algorithm.

Note that the parallelSort method use the ForkJoinPool introduced in Java 7 to take advantages of each processors of your computer as stated in the javadoc :

A ForkJoinPool is constructed with a given target parallelism level; by default, equal to the number of available processors.

Note that if the length of the array is less then 1 << 13, the array will be sorted using the appropriate Arrays.sort method.

See also

Jean-François Savard
  • 20,626
  • 7
  • 49
  • 76