Can Stream#limit return fewer elements than expected?

Question

If the Stream s below has at least n elements, what are the situations where the stream sLimit may have less than n elements, if any?

Stream sLimit = s.limit(n);

Reason for the question: in this answer, I read that:

Despite the appearances, using limit(10) doesn't necessarily result in a SIZED stream with exactly 10 elements -- it might have fewer.

I think Stuart meant that statement in a conceptual sense. As in, it makes more sense to use a `Stream` of a known size than to attempt to take `N` from a `Stream` whose size you don't know. I don't think there's any possibility where `limit(n)` would return a new `Stream` with less than `n` elements if the original `Stream` had at least `n` elements. — Sotirios Delimanolis, Jan 22 '15 at 17:18
Wow, I step away for an hour and you guys are all over this already. :-) — Stuart Marks, Jan 22 '15 at 17:55
@StuartMarks I didn't mean to wake you up but thanks for the clarification :-) — assylias, Jan 22 '15 at 20:25

Holger · Answer 1 · 2015-01-22T17:29:29.633

You misunderstood the statement. If the Stream has at least n elements and you invoke limit(n) on it, it will have exactly n elements but the Stream implementation might not be aware of it and hence have a less than optimal performance.

In contrast, certain Stream sources (Spliterators) know for sure that they have a fixed size, e.g. when creating a Stream for an array or an IntStream via IntStream.range. They can be optimized better than a Stream with a limit(n).

When you create a parallel Stream via Stream.generate(MyClass::new).limit(10), the constructor will still be invoked sequentially and only follow-up operations might run in parallel. In contrast, when using IntStream.range(0, n).mapToObj(i -> new MyClass()), the entire Stream operation, including the constructor calls, can run in parallel.

score 5 · Accepted Answer · answered Jan 22 '15 at 18:31

I think Holger's and Sotirios' answers are accurate, but inasmuch as I'm the guy who made the statement, I guess I should explain myself.

I'm mainly talking about spliterator characteristics, in particular the SIZED characteristic. This is basically "static" information about the stream stages that is known at pipeline setup time, but before the stream actually executes. Indeed, it's used for determining the execution strategy for the stream, so it has to be known before the stream executes.

The limit() operation creates a spliterator that wraps its upstream spliterator, so the limit spliterator needs to determine what characteristics to return. Even if its upstream spliterator is SIZED, it doesn't know the exact size, so it has to turn off the SIZED characteristic.

So if you, the programmer, were to write:

IntStream.range(0, 100).limit(10)

you'd say of course that stream has exactly 10 elements. (And it will.) But the resulting spliterator is still not SIZED. After all, the limit operator doesn't know the difference between the above and this:

IntStream.range(0, 1).limit(10)

at least in terms of spliterator characteristics.

So that's why, even though there are times when it seems like it ought to, the limit operator doesn't return a stream of known size. This in turn affects the splitting strategy, which impacts parallel efficiency.

Can Stream#limit return fewer elements than expected?

2 Answers2

Linked