1

There are two main ways to operate on Java streams:

  • parallelStream()
  • stream()

As this post Should I always use a parallel stream when possible? indicated, there are downsides to using parallel streams.

Is there a way to dynamically switch between those types built into Java? So to use parallel streams for large collections and normal streams for small collections. Something like: myCollection.parallelStreamIfNeeded()

Samuel
  • 388
  • 5
  • 15
  • You can check the size of `myCollection`. Or would this be wrong in your opinion? – akuzminykh Sep 02 '22 at 18:35
  • I would heed the advice in the first question at that link *"In any case, measure, don't guess! Only a measurement will tell you if the parallelism is worth it or not."* So would that method measure itself? What if it only gets called once? In other words, how would that work? – Federico klez Culloca Sep 02 '22 at 18:50
  • @akuzminykh I could do that. I even could write a function so I dont duplicate code. But I thought there might be smarter way to maybe determine what stream type to use. Measuring the size of a collection without adjustments might also take O(n) time which is bad. – Samuel Sep 02 '22 at 19:29
  • @Samuel You should not worry too much about it and also about if parallel stream or not. Just focus on getting things done, i.e. functionality, features. The parallel stream in Java uses the `commonPool` or the fork join framework. You maybe want to google about it. All the threads of the thread pool that is working on your tasks are likely initialized. There will be a bit overhead but not as much as you might think. If you assume your call to have many items, then just do the `parallelStream`, otherwise do `stream`. If you want to be sure, then measure it as the second comment suggests. – akuzminykh Sep 02 '22 at 20:47
  • 1
    There’s no way for the library to know beforehand the weight of the operations you will chain. You are the only one who can know this. That’s why you are the one to decide which type of stream to use. But keep in mind that for a parallel stream to pay off, you not only need a sufficient number of elements and a sufficiently expensive operation, but also free worker threads to utilize. If each CPU core is busy processing data anyway (e.g. because you’re processing a million incoming independent requests, there’s no point in parallelism within the one processing task. – Holger Sep 05 '22 at 09:02

1 Answers1

0

As mentioned in the above comments, there is no library or automatic way to detect when to use parallel stream or not. However if you do find a reasonable way [such as by running few benchmark tests], one way to write your code would be:

StreamSupport.stream(myCollection.spliterator(), determineIfParallelStreamRequired(myCollection))
   .map(...) //Do whatever operation you're going to perform on the stream

private boolean determineIfParallelStreamRequired(Collection<String> myCollection) {
   //Determine whether to do parallel or not
}
user1692342
  • 5,007
  • 11
  • 69
  • 128