I mostly agree with great @Holger answer, but I would put accents differently. I think it is hard for you to understand the need for a buffer because you have very simplistic mental model of what Stream API allows. If one thinks about Stream as a sequence of map
and filter
, there is no need for additional buffer because those operations have 2 important "good" properties:
- Work on one element at a time
- Produce 0 or 1 element as a result
However those are not true in general case. As @Holger (and I in my original answer) mentioned there is already flatMap
in Java 8 that breaks rule #2 and in Java 9 they've finally added takeWhile that actually transforms on whole Stream
-> Stream
rather than on a per-element basis (and that is AFAIK the first intermediate shirt-circuiting operation).
Another point I don't quite agree with @Holger is that I think that the most fundamental reason is a bit different than the one he puts in the second paragraph (i.e. a) that you may call tryAdvance
post the end of the Stream
many times and b) that "there is no guaranty that the caller will always pass the same consumer"). I think that the most important reason is that Spliterator
being functionally identical to Stream
has to support short-circuiting and laziness (i.e. ability to not process the whole Stream
or else it can't support unbound streams). In other words, even if Spliterator API (quite strangely) required that you must use the same Consumer
object for all calls of all methods for a given Spliterator
, you would still need tryAdvance
and that tryAdvance
implementation would still have to use some buffer. You just can't stop processing data if all you've got is forEachRemaining(Consumer<? super T> )
so you can't implement anything similar to findFirst
or takeWhile
using it. Actually this is one of the reasons why inside JDK implementation uses Sink
interface rather than Consumer
(and what "wrap" in wrapAndCopyInto
stands for): Sink
has additional boolean cancellationRequested()
method.
So to sum up: a buffer is required because we want Spliterator
:
- To use simple
Consumer
that provides no means to report back end of processing/cancellation
- To provide means to stop processing of the data by a request of the (logical) consumer.
Note that those two are actually slightly contradictory requirements.
Example and some code
Here I'd like to provide some example of code that I believe is impossible to implement without additional buffer given current API contract (interfaces). This example is based on your example.
There is simple Collatz sequence of integers that is conjectured to always eventually hit 1. AFAIK this conjecture is not proved yet but is verified for many integers (at least for whole 32-bit int range).
So assume that the problem we are trying to solve is following: from a stream of Collatz sequences for random start numbers in range from 1 to 1,000,000 find the first that contains "123" in its decimal representation.
Here is a solution that uses just Stream
(not a Spliterator
):
static String findGoodNumber() {
return new Random()
.ints(1, 1_000_000) // unbound!
.flatMap(nr -> collatzSequence(nr))
.mapToObj(Integer::toString)
.filter(s -> s.contains("123"))
.findFirst().get();
}
where collatzSequence
is a function that returns Stream
containing the Collatz sequence until the first 1 (and for nitpickers let it also stop when current value is bigger than Integer.MAX_VALUE /3
so we don't hit overflow).
Every such Stream
returned by collatzSequence
is bound. Also standard Random
will eventually generate every number in the provided range. It means that we are guaranteed that there eventually will be some "good" number in the stream (for example just 123
) and findFirst
is short-circuiting so the whole operation will actually terminate. However no reasonable Stream API implementation can predict this.
Now let's assume that for some strange reason you want to perform the same thing using intermediate Spliterator
. Even though you have only one piece of logic and no need for different Consumer
s, you can't use forEachRemaining
. So you'll have to do something like this:
static Spliterator<String> createCollatzRandomSpliterator() {
return new Random()
.ints(1, 1_000_000) // unbound!
.flatMap(nr -> collatzSequence(nr))
.mapToObj(Integer::toString)
.spliterator();
}
static String findGoodNumberWithSpliterator() {
Spliterator<String> source = createCollatzRandomSpliterator();
String[] res = new String[1]; // work around for "final" closure restriction
while (source.tryAdvance(s -> {
if (s.contains("123")) {
res[0] = s;
}
})) {
if (res[0] != null)
return res[0];
}
throw new IllegalStateException("Impossible");
}
It is also important that for some starting numbers the Collatz sequence will contain several matching numbers. For example, both 41123
and 123370
(= 41123*3+1) contain "123". It means that we really don't want our Consumer
to be called post the first matching hit. But since Consumer
doesn't expose any means to report end of processing, WrappingSpliterator
can't just pass our Consumer
to the inner Spliterator
. The only solution is to accumulate all results of inner flatMap
(with all the post-processing) into some buffer and then iterate over that buffer one element at a time.