Calling grouped
on a Scala Stream
seems to buffer the entire stream into memory. I've dug in quite a bit here to determine which class is holding the reference to Stream's head.
A simple example:
lazy val stream1: Stream[Int] = {
def loop(v: Int): Stream[Int] = v #:: loop(v + 1)
loop(0)
}
stream1.take(1000).grouped(10).foreach(println)
If one runs this code and places a breakpoint within the foreach
function, one can see that there is a reference being held to the Stream's head as it's drawn out.
After several iterations, there are still references to earlier "chunks" of the Stream in memory:
Additionally, if we inspect the reference to the head of the Stream, we can see that some lambda within IterableLike is holding a reference.
When grouped
is called on the Stream
, the Collections library first calls iterator
on the Stream
, returning a StreamIterator
and then grouped
on that iterator, returning a GroupedIterator
. The screenshots above suggest that something within GroupedIterator
seems to be holding onto the head of the Stream, but I cannot determine what.
My question is twofold: 1. Is this expected behavior with Scala Streams? If not, what is happened within the implementation of StreamIterator and GroupedIterator to cause the head of a Stream to be held onto while running .grouped(N)
on a Stream
?