What is the right configuration for Java 8 Stream

Question

I have the following code (just an example that i wrote for this question) which simply calculates sum of a range I implemented it in three ways:

Serial
Parallel Stream
With ForkJoinPool

Surprisingly the Serial method was the fastest one. In fact it takes %10 of the time of the other two.

What is the right configurations for Java Stream in order to make it faster? What wrong i am doing ?

package ned.main;

import java.util.Date;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ForkJoinPool;
import java.util.stream.IntStream;

public class TestParallelStream {

    static private void testParallelStream() {
        System.setProperty("java.util.concurrent.ForkJoinPool.common‌.parallelism", "1000000");

        ForkJoinPool forkJoinPool = new ForkJoinPool(10000);

        Date start = new Date();

        long sum1 = 0;
        for (int i = 0; i < 1_000_000; ++i) {
            sum1 += i * 10;
        }

        Date start1 = new Date();

        long sum2 = IntStream.range(1, 1_000_000)
                        .parallel()
                        .map(x -> x * 10)
                        .sum();

        Date start2 = new Date();

        try {
            long sum3 = forkJoinPool.submit(() -> 
                IntStream
                    .range(1, 1_000_000)
                    .parallel()
                    .map(x -> x * 10)
                    .sum())
                        .get();
        } catch (InterruptedException | ExecutionException e) {
            e.printStackTrace();
        }

        long serial = start1.getTime() - start.getTime();
        long parallelstream = start2.getTime() - start1.getTime();
        long withfork = start2.getTime() - start1.getTime();

        System.out.println(serial + "," + parallelstream + "," + withfork);

    }

    public static void main(String[] args) {
        testParallelStream();
    }
}

Thanks

Read http://stackoverflow.com/questions/20375176/should-i-always-use-a-parallel-stream-when-possible — JB Nizet, Mar 30 '17 at 22:50
Related: [How do I write a correct micro-benchmark in Java?](//stackoverflow.com/q/504103) — 4castle, Mar 30 '17 at 22:51
The jvm takes a lot longer to set up streams/threads than to accumulate a multiplication. Try to use more complexity and the numbers will change — bichito, Mar 30 '17 at 23:04
Thanks for the comments. I agree that the sample program is gave might be the case you mentioned. But how can i configure the parallel approach correctly and how can i ensure i am getting the optimal performance? In my real application i have thousands of vectors and i need to find the nearest to a given vector. That is more complex but still shows hat serial is faster — Samer Aamar, Mar 31 '17 at 07:28

score 2 · Accepted Answer · answered Mar 31 '17 at 10:58

It seems to have a fundamentally wrong understanding of the parallelism property. To utilize all CPU cores for a computation, the parallelism should match the number of cores, which is already the default.

Setting the parallelism to 1000000 makes no sense—even in the unlikely case that you really have 1000000 processors, as in that case, it’s still obsolete to set what is already the default. As a side note, if you had 1000000 processing units, a job consisting of 1000000 multiplications would be way too small to benefit from this hardware. You start one thread for each int multiplication, which is insane.

If in doubt, don’t mess with that setting and leave the parallelism at its default.

It still depends on the actual operation, whether it will benefit from parallel processing. The JVM’s optimizer will process small chunks of sequential code only, so splitting an operation into chunks to be processed in parallel may reduce the code optimization benefit.

In the most extreme variant, a loop of the form

long sum1 = 0;
for(int i=from; i<to; ++i) sum1 += i * constant;

can be optimized to

long sum1=((long)from+to-1)*(to-from)/2  * constant;

which would result in a constant calculation time for arbitrary ranges, so splitting the range into subranges, to be calculated in parallel, couldn’t shorten the required time in general. But that’s, of course, JVM specific.

In case of HotSpot, which has some very restrictive inlining thresholds, it can happen that performing the operation with stream code exceeds them, reducing the JVM’s optimization potential. Whether this happens, can be tested by also benchmarking an equivalent sequential stream operation. In the best case, it should perform exactly like the loop. If not, you know that the stream operation will carry a disadvantage over the loop that will apply to parallel streams as well. Tuning the JVM options may help (hopefully, the defaults will become more “stream friendly” in the future).

score 0 · Answer 2 · answered Mar 31 '17 at 17:28

By my personal experiences, serial stream is the best choice for 99% task, comparing parallel stream. Here is article about When to use parallel streams from Doug Lea. Basically, considering to use parallel stream when and only when you meet performance issue. There are some hints:

Make db/network or other IO calls in steps of the stream
CPU-heavy calculation (e.g. sum more than 10K ints. Adding tens, or hundreds of ints is not a heavy calculation).

Personally, I think parallel stream is over emphasized for daily coding.

What is the right configuration for Java 8 Stream

2 Answers2