Why in this snippet parallel execution is taking longer than sequential?

Question

The parallel execution of below code taking longer than the sequential code. I know parallel streams have more complexity and are more expensive than sequential streams and we can't expect parallel streams to work wonders all the time. I am just concerned about the below code

List<Integer> collect = IntStream.rangeClosed(1, 1000000)
    .unordered()
    .parallel()
    .filter(e -> e%7 == 0)
    .boxed()
    .collect(Collectors.toList());

    long endTime = System.nanoTime();

    collect.forEach(System.out::println);
    System.out.println(endTime - startTime);

output:

With Sequential Stream : 40 227 795
With Parallel Stream: 74 656 768

Is this stream stateful? If not then why it's taking longer with the parallel stream? What can be the reason behind this? Can there be a precise guess on this?

1. [How do I write a correct micro-benchmark in Java?](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) 2. parallel execution has more complexity, it introduces an overhead that only pays off if you know what you are doing, if the task is properly suited for parallel processing. You cannot just make something parallel and expect it to be faster, that is not how things work. — luk2302, Dec 06 '19 at 14:10
Does this answer your question? [Should I always use a parallel stream when possible?](https://stackoverflow.com/questions/20375176/should-i-always-use-a-parallel-stream-when-possible) — Turamarth, Dec 06 '19 at 14:11

score 2 · Answer 1 · answered Dec 06 '19 at 14:16

the sole fact that You are processing something in parallel, doesn't always mean that this will be faster than sequential processing. This is a quite complex topic in programming. There is a topic on this matter in C# here. But it can be also applied to Java.

Shortly speaking, creating new threads is a very costly operation and it does take time to create it. Also, working in multithreaded environments requires, so-called context switching, related to the fact that there is usually more processes than the actual cores, so they need to share the core when doing some operations, which is also quite costly. You can find some gentle introduction not related to any specific programming language but rather focusing on the problems in general here.

JakeRobb · Answer 2 · 2019-12-06T15:29:03.763

By default, Stream.parallel() uses ForkJoinPool.commonPool() as its thread pool. Threads in that pool are dynamically allocated. This means that if you run the above code in isolation, your benchmark includes the time it takes for the ThreadFactory to generate the threads (a somewhat expensive operation).

That being the case, you're more likely to see benefits if you:

pre-warm the thread pool by, for example, performing another parallel stream operation before you start the timer.
increase the size of the threaded workload. The work you're doing is fairly trivial, so it will (apparently) take more than one million items in the stream to show the benefit.

Note that ForkJoinPool.commonPool() has up to (#cores - 1) threads in its pool, so you might get better or worse results running the same test on different computers -- but remember, for smaller workloads, fewer threads are more likely to show benefit due to reduced overhead.

Why in this snippet parallel execution is taking longer than sequential?

2 Answers2