This is an implementation dependent limitation. One thing that developers, concerned about parallel performance, have to understand, is that predictable stream sizes help the parallel performance generally as they allow balanced splitting of the workload.
The issue here is, that the combination of an infinite stream as created via Stream.generate()
and limit()
does not produce a stream with a predictable size, despite it looks perfectly predictable to us.
We can examine it using the following helper method:
static void sizeOf(String op, IntStream stream) {
final Spliterator.OfInt s = stream.spliterator();
System.out.printf("%-18s%5d, %d%n", op, s.getExactSizeIfKnown(), s.estimateSize());
}
Then
sizeOf("randoms with size", ThreadLocalRandom.current().ints(1000));
sizeOf("randoms with limit", ThreadLocalRandom.current().ints().limit(1000));
sizeOf("range", IntStream.range(0, 100));
sizeOf("range map", IntStream.range(0, 100).map(i->i));
sizeOf("range filter", IntStream.range(0, 100).filter(i->true));
sizeOf("range limit", IntStream.range(0, 100).limit(10));
sizeOf("generate limit", IntStream.generate(()->42).limit(10));
will print
randoms with size 1000, 1000
randoms with limit -1, 9223372036854775807
range 100, 100
range map 100, 100
range filter -1, 100
range limit -1, 100
generate limit -1, 9223372036854775807
So we see, certain sources like Random.ints(size)
or IntStream.range(…)
produce streams with a predictable size and certain intermediate operations like map
are capable of carrying the information as they know that the size is not affected. Others like filter
and limit
do not propagate the size (as a known exact size).
It’s clear that filter
cannot predict the actual number of elements, but it provides the source size as an estimate which is reasonable insofar that that’s the maximum number of elements that can ever pass the filter.
In contrast, the current limit
implementation does not provide a size, even if the source has an exact size and we know the predictable size is as simple as min(source size, limit)
. Instead, it even reports a nonsensical estimate size (the source’s size) despite the fact that it is known that the resulting size will never be higher than the limit. In case of an infinite stream we have the additional obstacle that the Spliterator
interface, on which streams are based, doesn’t have a way to report that it is infinite. In these cases, infinite stream + limit returns Long.MAX_VALUE
as an estimate which means “I can’t even guess”.
Thus, as a rule of thumb, with the current implementation, a programmer should avoid using limit
when there is a way to specify the desired size beforehand at the stream’s source. But since limit
also has significant (documented) drawbacks in the case of ordered parallel streams (which doesn’t applies to randoms nor generate
), most developers avoid limit
anyway.