4

Calling grouped on a Scala Stream seems to buffer the entire stream into memory. I've dug in quite a bit here to determine which class is holding the reference to Stream's head.

A simple example:

lazy val stream1: Stream[Int] = {
  def loop(v: Int): Stream[Int] = v #:: loop(v + 1)
  loop(0)
}

stream1.take(1000).grouped(10).foreach(println)

If one runs this code and places a breakpoint within the foreach function, one can see that there is a reference being held to the Stream's head as it's drawn out.

After several iterations, there are still references to earlier "chunks" of the Stream in memory: Stream Cons in memory after a few iteration

Additionally, if we inspect the reference to the head of the Stream, we can see that some lambda within IterableLike is holding a reference.

enter image description here

When grouped is called on the Stream, the Collections library first calls iterator on the Stream, returning a StreamIterator and then grouped on that iterator, returning a GroupedIterator. The screenshots above suggest that something within GroupedIterator seems to be holding onto the head of the Stream, but I cannot determine what.

My question is twofold: 1. Is this expected behavior with Scala Streams? If not, what is happened within the implementation of StreamIterator and GroupedIterator to cause the head of a Stream to be held onto while running .grouped(N) on a Stream?

Christian Benincasa
  • 1,215
  • 1
  • 21
  • 45
  • Isn't `stream1`, which is a `val`, holding on to the head? What happens if there is no `stream1`, i.e. `loop(0).take(....` etc.? – jwvh Feb 21 '19 at 23:37
  • @jwvh: `object Foo { def main(args: Array[String]): Unit = { Stream.continually(0) .grouped(100) .zipWithIndex.foreach{ case (x,i) => if (i % 100 == 0) println(System.currentTimeMillis + " " + i) } } }` also just freezes after a while. – Andrey Tyukin Feb 22 '19 at 00:30
  • @jwvh true - the "lazy val" setup is pulled from the Stream scaladoc itself; it doesn't seem to hold onto head itself (still wrapping my brain around that one a bit too). For completeness, I tried you suggestion like: `loop(0).take(1000).grouped(10).foreach(println)` and still see the same behavior with a breakpoint within that println – Christian Benincasa Feb 22 '19 at 01:08

0 Answers0