Stream spliterator implementation detail

Question

While looking into the source code of the WrappingSpliterator::trySplit, I was very mislead by it's implementation:

    @Override
    public Spliterator<P_OUT> trySplit() {
        if (isParallel && buffer == null && !finished) {
            init();

            Spliterator<P_IN> split = spliterator.trySplit();
            return (split == null) ? null : wrap(split);
        }
        else
            return null;
    }

And if you are wondering why this matters, is because for example this:

Arrays.asList(1,2,3,4,5)
      .stream()
      .filter(x -> x != 1)
      .spliterator();

is using it. In my understanding the addition of any intermediate operation to a stream, will cause that code to be triggered.

Basically this method says that unless the stream is parallel, treat this Spliterator as one that can not be split, at all. And this matters to me. In one of my methods (this is how I got to that code), I get a Stream as input and "parse" it in smaller pieces, manually, with trySplit. You can think for example that I am trying to do a findLast from a Stream.

And this is where my desire to split in smaller chunks is nuked, because as soon as I do:

Spliterator<T> sp = stream.spliterator();
Spliterator<T> prefixSplit = sp.trySplit();

I find out that prefixSplit is null, meaning that I basically can't do anything else other than consume the entire sp with forEachRemaning.

And this is a bit weird, may be it makes some sense for when filter is present; because in this case the only way (in my understanding) a Spliterator could be returned is using some kind of a buffer, may be even with a predefined size (much like Files::lines). But why this:

Arrays.asList(1,2,3,4)
      .stream()
      .sorted()
      .spliterator()
      .trySplit();

returns null is something I don't understand. sorted is a stateful operation that buffers the elements anyway, without actually reducing or increasing their initial number, so at least theoretically this can return something other than null...

Well... maybe because the stream is not parallel? Obvious question... Have you tried `Arrays.asList(1,2,3,4).parallelStream()......`? — fps, Apr 02 '19 at 20:57
@FedericoPeraltaSchaffner of course... :) But if you remove `filter` and the stream is not parallel either, the splitting will work. The documentation does not say that this must happen for a parallel stream. — Eugene, Apr 03 '19 at 04:52
from javadoc: *
This method may return {@code null} for any reason, * including emptiness, inability to split after traversal has * commenced, data structure constraints, and efficiency * considerations. — Wisthler, Apr 03 '19 at 06:59
@Wisthler yup I read that, it's just it feels kind of weird, I guess — Eugene, Apr 03 '19 at 07:03
When you don’t chain an intermediate operation, `spliterator()` will just return the source spliterator, i.e. `Arrays.asList(1,2,3,4,5) .stream() .spliterator() .getClass() == Arrays.spliterator(new Integer[] { 1,2,3,4,5 }) .getClass()`. Such a spliterator does not even know whether the Stream is parallel or if a Stream exist at all. And no, `Arrays.asList(1,2,3,4,5) .parallelStream() .filter(x -> x != 1) .spliterator();` does *not* need any buffering. — Holger, Apr 03 '19 at 07:22
@Holger right, I have seen that the source spliterator is returned in case there are no intermediate operations, even the `WrappingSpliterator` "knows" about it via `spliteratorSupplier`. The second point is that I *thought* it might be implemented with a buffer, I looked up the implementation now and you are right, it splits the source spliterator... so the question in this case would be is this is a deliberate decision for a sequential stream? do you happen to know the reasons for this? thank you — Eugene, Apr 03 '19 at 07:33
Of course, it’s a deliberate decision, as someone put the `isParallel` into the `if (isParallel && …` line. Whether there is a reason (and I suppose you mean “good reason”) for that, is a different question. I don’t see any advantage in this restriction. — Holger, Apr 03 '19 at 07:51

score 1 · Accepted Answer · answered May 09 '19 at 15:48

When you invoke spliterator() on a Stream, there are only two possible outcomes with the current implementation.

If the stream has no intermediate operations you’ll get the source spliterator that has been used to construct the stream and whose splitting capability is entirely independent from the stream’s parallel state, as in fact, the spliterator doesn’t know anything about the stream.

Otherwise, you’ll get a WrappingSpliterator, which will encapsulate a source Spliterator and a pipeline state, expressed as PipelineHelper. This combination of Spliterator and PipelineHelper does not need to work in parallel and, in fact, would not work in case of distinct(), as the WrappingSpliterator will get an entirely different combination, depending on whether the Stream is parallel or not.

For stateless intermediate operations, it would not make a difference though. But, as discussed in “Why the tryAdvance of stream.spliterator() may accumulate items into a buffer?”, the WrappingSpliterator is a “one-fits-all implementation” that doesn’t consider the actual nature of the pipeline, so its limitations are the superset of all possible limitations of all supported pipeline stages. So the existence of one scenario that wouldn’t work when ignoring the parallel flag is enough to forbid splitting for all pipelines when not being parallel.

Stream spliterator implementation detail

1 Answers1