Performance of Java Parallel Stream vs ExecutorService

Question

Suppose we have a list and want to pick all the elements satisfying a property(let say some functions f). There are 3 ways to parallel this process.

One :

listA.parallelStream.filter(element -> f(element)).collect(Collectors.toList());

Two:

listA.parallelStream.collect(Collectors.partitioningBy(element -> f(element))).get(true);

Three:

ExecutorService executorService = Executors.newFixedThreadPool(nThreads);
//separate the listA into several batches
for each batch {
     Future<List<T>> result = executorService.submit(() -> {
          // test the elements in this batch and return the validate element list
     });
}
//merge the results from different threads.

Suppose the testing function is a CPU intensive task. I want to know which method is more efficient. Thanks a lot.

I don't the exact right answer, but my non-scientific answer would be to go with option one. It's clear and concise and the performance difference between either of those other solutions is negligible. That's just me speculating though. — Sam Orozco, Sep 12 '18 at 03:45
When to use parallelStreams : http://gee.cs.oswego.edu/dl/html/StreamParallelGuidance.html — Raj, Sep 12 '18 at 03:50
Cause the answer may vary depending the context, i suggest you to explain what is that intensive task doing. — Rcordoval, Sep 12 '18 at 04:23
For another question, i wrote a performance comparison of manual vs. parallel stream implementations. My personnal conclusion **on this precise use case** (the filtering was a simple string.startsWith call) was that writing ExecutorService code might be faster as the list grows huge but not necessarily worth risking a bug / corner case that the stream API authors have already sorted. https://stackoverflow.com/questions/52271232/prefix-search-algorithim-using-multi-cores/52276163#52276163 . I suspect that if the filtering is complex, it will probably dwarf threading implementation differences. — GPI, Sep 12 '18 at 08:58

Evgeniy Dorofeev · Answer 1 · 2018-09-12T05:50:48.450

5

One and two use ForkJoinPool which is designed exactly for parallel processing of one task while ThreadPoolExecutor is used for concurrent processing of independent tasks. So One and Two are supposed to be faster.

edited Sep 12 '18 at 05:50

answered Sep 12 '18 at 05:18

Evgeniy Dorofeev

133,369
30
199
275

1

In which regard does “designed exactly for parallel processing of one task” materialize in practice? Don’t you think that the fact that the variant two does almost doubled work, has an actual impact on performance? – Holger Sep 12 '18 at 08:01

score 3 · Answer 2 · answered Sep 12 '18 at 08:17

When you use .filter(element -> f(element)).collect(Collectors.toList()), it will collect the matching elements into a List, whereas .collect(Collectors.partitioningBy(element -> f(element))) will collect all elements into either of two lists, followed by you dropping one of them and only retrieving the list of matches via .get(true).

It should be obvious that the second variant can only be on par with the first in the best case, i.e. if all elements match the predicate anyway or when the JVM’s optimizer is capable of removing the redundant work. In the worst lase, e.g. when no element matches, the second variant collects a list of all elements only to drop it afterwards, where the first variant would not collect any element.

The third variant is not comparable, as you didn’t show an actual implementation but just a sketch. There is no point in comparing a hypothetical implementation with an actual. The logic you describe, is the same as the logic of the parallel stream implementation. So you’re just reinventing the wheel. It may happen that you do something slightly better than the reference implementation or just better tailored to your specific task, but chances are much higher that you overlook things which the Stream API implementors did already consider during the development process which lasted several years.

So I wouldn’t place any bets on your third variant. If we add the time you will need to complete the third variant’s implementation, it can never be more efficient than just using either of the other variants.

So the first variant is the most efficient variant, especially as it is also the simplest, most readable, directly expressing your intent.

Someone argues that there are some thread safety issues for the first approach. Because the different threads adding elements into a same list concurrently is low efficient. While with the second approach, the list insert synchronization can be avoided. Not sure this view is correct. — Polaris, Sep 12 '18 at 14:58
There are no thread safety issues with the first approach. It’s the principle of `collect` that multiple threads accumulate into different containers and merge afterwards. Further, `Collectors.partitioningBy(predicate)` is just a short-hand for `Collectors.partitioningBy(predicate, Collectors.toList())`, using exactly the same collector for producing the lists. Therefore, it doesn’t add anything to thread safety nor performance. It’s the right approach if you really need both lists, but if you only need one of them, just use `.filter(predicate).collect(Collectors.toList())` in the first place. — Holger, Sep 12 '18 at 15:39

Performance of Java Parallel Stream vs ExecutorService

2 Answers2