Well, as discussed in this question, there is hardly any reason to use LinkedList
at all. The higher iteration costs apply to all operations, not just parallel streams.
Generally, the splitting support has indeed a big impact on the parallel performance. First, whether it has a genuine, hopefully cheap, splitting support rather than inheriting the buffering default behavior of AbstractSpliterator
, second, how balanced the splits are.
In this regard, there is no reason why a binary tree should perform badly. A tree can be split into sub-trees easily and if the tree is balanced at the beginning, the splits will be balanced too. Of course, this requires that the actual Collection
implementation implements the spliterator()
method returning a suitable Spliterator
implementation rather than inheriting the default
method. E.g. TreeSet
has a dedicated spliterator. Still, iterating the sub-trees might be more expensive than iterating an array, but that’s not a property of the parallel processing, as that would apply to sequential processing as well or any kind of iteration over the elements in general.
The question, how to use LinkedList
and BlockingQueue
in case of parallel streams, is moot. You choose the collection type depending on the application’s needs and if you really need one of these (in case of LinkedList
hard to imagine), then you use it and live with the fact that its parallel stream performance would be less than that of ArrayList
, which apparently didn’t fit your other needs. There is no general trick to make the parallel stream performance of badly splittable collections better. If there was, it would be part of the library.
There are some corner cases where the JRE doesn’t provide the maximum performance, which will be addressed in Java 9, like String.chars()
, Files.lines()
or the default spliterator for 3rd part RandomAccess
List
s, but none of these apply to LinkedList
, BlockingQueue
or custom Binary Tree implementations.
In other words, if you have a particular use case with a particular collection, there might be something to improve, but there is no trick that could improve the parallel performance of all tasks with all collections.
It is correct that stateful intermediate operations like distinct()
, sorted()
, limit()
, skip()
have higher costs for parallel streams and their documentation even tells this. So we could give the general advice to avoid them, especially for parallel stream, but that would be kind of pointless, as you didn’t use them, if you didn’t need them. And again, there is no general work-around for that, as there wouldn’t be much sense in offering these operations if there was a generally better alternative.