Is there a way to force parallelStream() to go parallel?

Question

If the input size is too small the library automatically serializes the execution of the maps in the stream, but this automation doesn't and can't take in account how heavy is the map operation. Is there a way to force parallelStream() to actually parallelize CPU heavy maps?

Your linked question already contains the answer (by the esteemed Brian Goetz) to your question in the comments. — Kayaman, Jun 28 '17 at 10:35
In the question linked no answer has been given, just a workaround with other facilities for other purpouses. I can't comment his answer because i'm a new user, I was able just to open a new question — Disruptive, Jun 28 '17 at 11:28
Well, as explained, you can't force it. Instead use an executor. Your workaround of adding redundant elements is a pretty horrible hack. — Kayaman, Jun 28 '17 at 11:40
I understand that, but sometimes you need to use the right tool for the right job, instead of insisting on using the wrong tool and hacking around just because you think you have to use the wrong tool. If all you have is a hammer, then everything looks like a nail, and it sounds like your hammer is the stream api. — Kayaman, Jun 28 '17 at 11:45
To me it looks much more like an api limitation/misbehaviour instead of an api hammer. Api should provide functionalities, stopping a legit behaviour through an automatism that *should* help is something useful but that should be possibile to tune. — Disruptive, Jun 28 '17 at 11:49
I'm not talking about the possible limitations of the Stream API. I'm talking about you choosing a suboptimal solution for the sole reason that you want to use Stream API. That's not a very good quality in a software developer. — Kayaman, Jun 28 '17 at 11:52
I'm just trying to understand perfectly what are the stream api limits, and discovering if was actually possible triggering parallelism without caring of the stream size was one step. Deciding what is optimal or non optimal is a job that I leave to the people who know the code they are working on — Disruptive, Jun 28 '17 at 11:55
Well, [so far](http://download.java.net/java/jdk9/docs/api/java/util/stream/package-summary.html) it seems that you won't [get what you want](http://www.baeldung.com/java-9-stream-api) in JDK9 either. — Kayaman, Jun 28 '17 at 12:01

Holger · Accepted Answer · 2023-05-11T07:17:51.603

16

There seems to be a fundamental misunderstanding. The linked Q&A discusses that the stream apparently doesn’t work in parallel, due to the OP not seeing the expected speedup. The conclusion is that there is no benefit in parallel processing if the workload is too small, not that there was an automatic fallback to sequential execution.

It’s actually the opposite. If you request parallel, you get parallel, even if it actually reduces the performance. The implementation does not switch to the potentially more efficient sequential execution in such cases.

So if you are confident that the per-element workload is high enough to justify the use of a parallel execution regardless of the small number of elements, you can simply request a parallel execution.

As can easily demonstrated:

Stream.of(1, 2).parallel()
      .peek(x -> System.out.println("processing "+x+" in "+Thread.currentThread()))
      .forEach(System.out::println);

On Ideone, it prints

processing 2 in Thread[main,5,main]
2
processing 1 in Thread[ForkJoinPool.commonPool-worker-1,5,main]
1

but the order of messages and details may vary. It may even be possible that in some environments, both task may happen to get executed by the same thread, if it can steal the second task before another thread is started to pick it up. But of course, if the tasks are expensive enough, this will not happen. The important point is that the overall workload has been split and enqueued to be potentially picked up by other worker threads.

If execution by a single thread happens in your environment for the simple example above, you may insert simulated workload like this:

Stream.of(1, 2).parallel()
      .peek(x -> System.out.println("processing "+x+" in "+Thread.currentThread()))
      .map(x -> {
           LockSupport.parkNanos("simulated workload", TimeUnit.SECONDS.toNanos(3));
           return x;
        })
      .forEach(System.out::println);

Then, you may also see that the overall execution time will be shorter than “number of elements”×“processing time per element” if the “processing time per element” is high enough.

Update: the misunderstanding might be cause by Brian Goetz’ misleading statement: “In your case, your input set is simply too small to be decomposed”.

It must be emphasized that this is not a general property of the Stream API, but the Map that has been used. A HashMap has a backing array and the entries are distributed within that array depending on their hash code. It might be the case that splitting the array into n ranges doesn’t lead to a balanced split of the contained element, especially, if there are only two. The implementors of the HashMap’s Spliterator considered searching the array for elements to get a perfectly balanced split to be too expensive, not that splitting two elements was not worth it.

Since the HashMap’s default capacity is 16 and the example had only two elements, we can say that the map was oversized. Simply fixing that would also fix the example:

long start = System.nanoTime();

Map<String, Supplier<String>> input = new HashMap<>(2);
input.put("1", () -> {
    System.out.println(Thread.currentThread());
    LockSupport.parkNanos("simulated workload", TimeUnit.SECONDS.toNanos(2));
    return "a";
});
input.put("2", () -> {
    System.out.println(Thread.currentThread());
    LockSupport.parkNanos("simulated workload", TimeUnit.SECONDS.toNanos(2));
    return "b";
});
Map<String, String> results = input.keySet()
        .parallelStream().collect(Collectors.toConcurrentMap(
    key -> key,
    key -> input.get(key).get()));

System.out.println("Time: " + TimeUnit.NANOSECONDS.toMillis(System.nanoTime()- start));

on my machine, it prints

Thread[main,5,main]
Thread[ForkJoinPool.commonPool-worker-1,5,main]
Time: 2058

The conclusion is that the Stream implementation always tries to use parallel execution, if you request it, regardless of the input size. But it depends on the input’s structure how well the workload can be distributed to the worker threads. Things could be even worse, e.g. if you stream lines from a file.

If you think that the benefit of a balanced splitting is worth the cost of a copying step, you could also use new ArrayList<>(input.keySet()).parallelStream() instead of input.keySet().parallelStream(), as the distribution of elements within ArrayList always allows a perfectly balanced split.

edited May 11 '23 at 07:17

answered Jun 28 '17 at 12:42

Holger

285,553
42
434
765

@Holger: Is there any way to get the processed output by threads in sequence. Suppose I have a sorted array and I want to write the data in a file using parallel processing. Can I achieve this? – maverickabhi Mar 01 '19 at 08:10
1

@maverickabhi you can not process “in sequence” and “using parallel processing” at the same time. These are contradicting terms. When all you want, is to write an array to a file, there’s no benefit in parallel processing at all. If you have a stream with computational expensive intermediate operations, you can try to use a parallel stream and chain `forEachOrdered` as terminal action to write to the target file in parallel. But depending on the actual operations, the costs of writing the end result in order may still outweigh any benefit of parallel processing. – Holger Mar 01 '19 at 08:17
`The implementors of the HashMap’s Spliterator considered searching the array for elements to get a perfectly balanced split to be too expensive, not that splitting two elements was not worth it` Is it possible to give some reference to it? Could not find such information in the `trySplit` method in `ValueSpliterator` – Kamel Nov 28 '19 at 09:14
ValueSpliterator tries to split the array into size of 1, isn't it? https://www.codota.com/code/java/methods/java8.util.HMSpliterators$ValueSpliterator/%3Cinit%3E – Kamel Nov 28 '19 at 09:33
1

@Kamel what kind of information do you hope to find in the code? – Holger Nov 28 '19 at 09:35
@Holger I have write a follow up question. Basically, I could not understand why ValueSpliterator could not split the values into pieces. https://stackoverflow.com/questions/59085959/why-values-parallelstream-does-not-run-in-parallel – Kamel Nov 28 '19 at 09:53
Regarding "If you request parallel, you get parallel": The docs for `parallelStream()` say: "Returns a possibly parallel stream ... It is allowable for this method to return a sequential stream." – Just Me Dec 28 '20 at 14:06
... though it does seem `stream().parallel()` must return a parallel stream. – Just Me Dec 28 '20 at 14:09
For me there's a very noticable difference: `parallelStream()` runs *serially* (when presented with an array of 4 lengthy tasks, on my quad-core laptop), and only `stream().parallel()` really runs in parallel. – Just Me Dec 28 '20 at 14:22
1

@Just I don’t know any real life example where `parallelStream()` behaves different than `stream().parallel()`. Arrays have neither, `stream()` nor `parallelStream()`. – Holger Jan 04 '21 at 08:33
Tried to include a snippet from the "real life" code I referred to above, it came out horrible. – Just Me Jan 04 '21 at 09:23
But basically, `arrayList.stream().parallel().map(...)` is run in parallel, and `arrayList.parallStream().map(...)` does not, in real life. – Just Me Jan 04 '21 at 09:28
1

@Just post a [mcve], not a snippet containing lots of non-standard methods and classes. There are online testers where you can run your code to demonstrate the behavior. Perhaps, you want to open a new question instead of flooding this one with comments. – Holger Jan 04 '21 at 09:28
Your answer is somewhat misleading, in that you conflate `parallelStream()` with `stream().parallel()`. My comments are meant to (a) clarify that the doc's make a distinction and (b) that real-life experiments make a distinction. I think they are useful just where they are. – Just Me Jan 04 '21 at 09:32
2

@Just There is no difference between `parallelStream()` and `stream().parallel()`, except for a single user on the internet who claims otherwise without any proof. Either you prove your claim or you stop discussing. – Holger Jan 04 '21 at 09:34
Here's a [precise reference to the docs](https://docs.oracle.com/javase/8/docs/api/?java/util/stream/Stream.html) which for parallelStream() clearly states: "return a possibly parallel stream". A comment at the source (coretto 15 jdk) adds: "It is allowable for this method to return a sequential stream." This is in direct contradiction to your claim that "If you request parallel, you get parallel"(which is true, but only if you use `parallel()`). I'll get back to work now and stop spreading useful information, as you suggest. – Just Me Jan 04 '21 at 09:47
2

@Just this answer said “*when you request parallel*”; it never claimed that `parallelStream()` was sufficient to request parallel. But in real life, the result of `parallelStream()` always is a parallel stream, as you can query via [`isParallel()`](https://docs.oracle.com/javase/8/docs/api/java/util/stream/BaseStream.html#isParallel--). And then, the stream will operate in parallel mode, even when there is only one thread left and no benefit. That’s implied by “*even if it actually reduces the performance*”. If you find a contradicting real life example, feel free to show it. – Holger Jan 04 '21 at 09:54

Is there a way to force parallelStream() to go parallel?

1 Answers1

Linked