The point is that the act of creating and iterating is potentially expensive.
Consider this simple one: Files.lines
. It opens the file and reads it, line by line, exposing that as a Stream<String>
.
You can, of course, wrap this in a supplier: () -> Files.lines(somePath)
and pass that around. Anybody you gave that supplier to can invoke it, which results in the OS opening a file handle, and asking the disk to go spit out some bits.
But, the point is, the cost here is the stream process. The actual thing you do with the stream? Dwarfed, utterly, by the disk access (though with fast SSDs this matters less. If it helps, replace that with a file loaded over a network connection. That is metered).
Some streams aren't anything like that. if you have an ArrayList
and you call .stream()
on that, the 'processing' costs of the stream itself (making the stream object, and streaming through the list's elements), is an ignorable cost, it's just forwarding a memory pointer, through contiguous memory no less. It doesn't even cache miss much.
But Stream is an abstraction - the point is that it sucks to have to care. It's bad to have to document "if you pass a supplier that supplies a stream with high setup and processing costs, this code is quite slow and if its loaded across a metered network it'll really rack up your bill".
So, don't. Which means, the best thing to do is to stream only once.
Unfortunately there is no simple way to just make 2 Stream
objects that are powered by a single stream through.
One way to do this count thing would be:
AtomicInteger counter = new AtomicInteger();
Stream<String> dataCollection = ....;
dataCollection.peek(x -> counter.incrementAndGet()).forEach(...);
It's annoying to have to do it like this. A key problem here is that lambdas don't like changing state, and methods can only return one thing.