Effect of splittability and statefulness on the Stream parallel processing

Question

I am working with Stream parallel processing and get to know that if I am using plane Array stream, it gets processed very fast. But if I am using ArrayList, then the processing gets a bit slower. But if I use LinkedList or some Binary Tree, the processing gets even more slower.

All that sounds like the more is the splittability of stream, more faster the processing would be. That means array and array list is most efficient in case of parallel streams. Is it true? If so, Shall we always use ArrayList or Array if we want to process stream in parallel? If so, how to use LinkedList and BlockingQueue in case of parallel stream?

Another thing is the state-fulness of the intermediate functions chosen. If I perform stateless operations like filter(), map(), the performance is high, but if perform the state full operations like distinct(), sorted(), limit(), skip(), it takes a lot of time. So again, the parallel stream get slower. Does that means we should not go for state full intermediate functions in parallel stream? If so, then what is the work around for that?

score 5 · Answer 1 · answered Aug 02 '17 at 09:45

Well, as discussed in this question, there is hardly any reason to use LinkedList at all. The higher iteration costs apply to all operations, not just parallel streams.

Generally, the splitting support has indeed a big impact on the parallel performance. First, whether it has a genuine, hopefully cheap, splitting support rather than inheriting the buffering default behavior of AbstractSpliterator, second, how balanced the splits are.

In this regard, there is no reason why a binary tree should perform badly. A tree can be split into sub-trees easily and if the tree is balanced at the beginning, the splits will be balanced too. Of course, this requires that the actual Collection implementation implements the spliterator() method returning a suitable Spliterator implementation rather than inheriting the default method. E.g. TreeSet has a dedicated spliterator. Still, iterating the sub-trees might be more expensive than iterating an array, but that’s not a property of the parallel processing, as that would apply to sequential processing as well or any kind of iteration over the elements in general.

The question, how to use LinkedList and BlockingQueue in case of parallel streams, is moot. You choose the collection type depending on the application’s needs and if you really need one of these (in case of LinkedList hard to imagine), then you use it and live with the fact that its parallel stream performance would be less than that of ArrayList, which apparently didn’t fit your other needs. There is no general trick to make the parallel stream performance of badly splittable collections better. If there was, it would be part of the library.

There are some corner cases where the JRE doesn’t provide the maximum performance, which will be addressed in Java 9, like String.chars(), Files.lines() or the default spliterator for 3rd part RandomAccess Lists, but none of these apply to LinkedList, BlockingQueue or custom Binary Tree implementations.

In other words, if you have a particular use case with a particular collection, there might be something to improve, but there is no trick that could improve the parallel performance of all tasks with all collections.

It is correct that stateful intermediate operations like distinct(), sorted(), limit(), skip() have higher costs for parallel streams and their documentation even tells this. So we could give the general advice to avoid them, especially for parallel stream, but that would be kind of pointless, as you didn’t use them, if you didn’t need them. And again, there is no general work-around for that, as there wouldn’t be much sense in offering these operations if there was a generally better alternative.

score 4 · Accepted Answer · answered Aug 02 '17 at 08:24

Not a bad questions IMO.

Of course the array and ArrayList are going to be splittable much better then LinkedList or some type of a Tree. You can look at how their Spliterators are made to convince yourself. They usually start with some batch size (1024 elements) and increase from that. LinkedList does that and Files.lines if I remember correctly. So yes, using an arrays and ArrayList will have a very good parallelization.

If you want a better parallel support for some structures like LinkedList you could write your own spliterator - I think StreamEx did that for Files.lines to start with a smaller batch size... And this is a related question btw.

The other thing is that when you use stateful intermediate operations - you will effectively make intermediate operations that are above that stateful one - into stateful too... Let me provide an example:

  IntStream.of(1, 3, 5, 2, 6)
            .filter(x -> {
                System.out.println("Filtering : " + x);
                return x > 2;
            })
            .sorted()
            .peek(x -> System.out.println("Peek : " + x))
            .boxed()
            .collect(Collectors.toList());

This will print:

Filtering : 1
Filtering : 3
Filtering : 5
Filtering : 2
Filtering : 6
Peek : 3
Peek : 5
Peek : 6

Because you have used sorted and filter is above that, filter has to take all elements and process them - so that sorted is applied to the correct ones.

On the other hand if you dropped sorted:

  IntStream.of(1, 3, 5, 2, 6)
            .filter(x -> {
                System.out.println("Filtering : " + x);
                return x > 2;
            })
            // .sorted()
            .peek(x -> System.out.println("Peek : " + x))
            .boxed()
            .collect(Collectors.toList());

The output is going to be:

 Filtering : 1
 Filtering : 3
 Peek : 3
 Filtering : 5
 Peek : 5
 Filtering : 2
 Filtering : 6
 Peek : 6

Generally I do agree, I try to avoid (if I can) stateful intermediate operations - may be you don't want sorted let's say - may be you can collect to a TreeSet... etc. But I don't overthink it - it I need to use it - I just do and may be measure to see if it's really a bottleneck.

Unless you are really hitting some performance problems around this - I would not take that much into account; especially since you would need lots of elements to actually have some speed benefit from parallel.

Here is a related question that shows that you really really need lots of elements to see a performance gain.

But still i didn't get any alternative for the stateful operations? — KayV, Aug 02 '17 at 09:31

Effect of splittability and statefulness on the Stream parallel processing

2 Answers2