3

I am comfortable with streams, but I admit I am puzzled by this behavior:

import collection.immutable.Stream
object StreamForceTest extends App {
  println("Computing fibs")
  val fibs: Stream[BigInt] = BigInt(0) #:: BigInt(1) #::
     fibs.zip(fibs.tail).map((x: (BigInt, BigInt)) => {
       println("Adding " + x._1 + " and " + x._2);
       x._1 + x._2
     })
  println("Taking first 5 elements")
  val fibs5 = fibs.take(5)
  println("Computing length of that prefix")
  println("fibs5.length = " + fibs5.length)
}

with output

Computing fibs
Taking first 5 elements
Computing length of that prefix
Adding 0 and 1
Adding 1 and 1
Adding 1 and 2
fibs5.length = 5

Why should take(5) not force the stream's values to be computed, while length does do so? Offhand neither one needs to actually look at the values, but I would have thought that take was more likely to do it than length. Inspecting the source code on github, we find these definitions for take (including an illuminating comment):

override def take(n: Int): Stream[A] = (
  // Note that the n == 1 condition appears redundant but is not.
  // It prevents "tail" from being referenced (and its head being evaluated)
  // when obtaining the last element of the result. Such are the challenges
  // of working with a lazy-but-not-really sequence.
  if (n <= 0 || isEmpty) Stream.empty
  else if (n == 1) cons(head, Stream.empty)
  else cons(head, tail take n-1)
)

and length:

override def length: Int = {
  var len = 0
  var left = this
  while (!left.isEmpty) {
    len += 1
    left = left.tail
  }
  len
}

The definition of head and tail is obtained from the specific subclass (Empty and Cons). (Of course Empty is an object, not a class, and its definitions of head and tail just throw exceptions.) There are subtleties, but they seem to concern making sure that the tail of a Cons is evaluated lazily; the head definition is straight out of lecture 0 on Scala constructors. Note that length doesn't go near head, but it's the one that does the forcing.

All this is part of a general puzzlement about how close Scala streams are to Haskell lists. I thought Haskell treated head and tail symmetrically (I'm not a serious Haskell hacker), and Scala forced head evaluation in more circumstances. I'm trying to figure out exactly what those circumstances are.

airfoyle
  • 443
  • 2
  • 13
  • I'm not comfortable with streams, but for an Iterator, I'd expect take to be lazy. But I'm used to `iterator.size` exhausting the iterator just to check the length. Is there some other context or language in which take implies force? Just curious. I can understand that the verb take sounds like it forces a value. – som-snytt Jul 18 '16 at 00:14
  • I am not sure what puzzles you (especially, if you are really "comfortable with the streams"): `.take(5)` only needs to scan through the first five members of the stream to compute the result. `.length` needs to go all the way until the last element to count them. Really, I just can't see what can possibly be puzzling about this. – Dima Jul 18 '16 at 01:19
  • @Dima I don't understand. take evaluates nothing, because it's still lazy, but length must force. So what does `needs to scan through the first five members` mean? Also, `if you are really "comfortable with the streams"`? E.g., I was going for "self-deprecating humor" when I said "I'm not comfortable with streams", I hope it doesn't come across as sarcasm. In any case, our self-understanding is what it is. – som-snytt Jul 18 '16 at 05:03
  • @Dima Thanks; yes of course I should have realized the recursion in `take` would be suspended, because the recursive call is inside a `cons`. But if you wrote `length` recursively: `if (s.isEmpty) 0 else 1 + tail.length`, an asymmetry would be revealed. You can't access the tail without forcing the head. It doesn't *have* to be that way. – airfoyle Jul 20 '16 at 20:42
  • @airfoyle I don't understand what you are saying I am afraid. Perhaps, this is just too computer-sciency for me :) I mean, I am not sure about what "asymetry" you are talking about, but isn't it clear, even without knowing anything about the implementation, that in order to tell the length of the stream, you have to scan through it to the end, and to set a limit on far you can iterate through it you ... well, don't need to do anything? :) Or is your question specifically about implementation, because you were confused by the pass-by-name parameter to cons? – Dima Jul 20 '16 at 22:19
  • @Dima Sorry. The asymmetry is that when you access the tail you force the head, but when you access the head you don't force the tail. To show it doesn't have to be that way, it's easy to implement a `LazierStream` in which both constructor parameters are call-by-name and stored as lazy vals. Calculating its length would not force any heads. – airfoyle Jul 21 '16 at 14:21
  • @airfoyle I still don't understand. How can you calculate length without forcing head, _and every element after it_? Unless, you just happen to know how many elements are in the stream, that's impossible, whether the head is lazy or not. – Dima Jul 21 '16 at 14:39
  • @airfoyle, perhaps, if you showed a sample implementation of what you have in mind as "LazierStream", and demonstrated how `.length` would work, it would help me understand what you mean. – Dima Jul 21 '16 at 14:46

2 Answers2

4

Stream's head is strict and its tail is lazy, as you can see in cons.apply and in the Cons constructor:

def apply[A](hd: A, tl: => Stream[A]) = new Cons(hd, tl)

class Cons[+A](hd: A, tl: => Stream[A]) extends Stream[A]

Notice the context in which the take method refers to tail:

cons(head, tail take n-1)

Because the expression tail take n-1 is used as the second argument to cons, which is passed by name, it doesn't force evaluation of tail take n-1, thus doesn't force evaluation of tail.

Whereas in length, the statement

left = left.tail

, by assigning left.tail to a var, does force its evaluation.

Scala is "strict by default". In most situations, everything you reference will be evaluated. We only have lazy evaluation in cases where a method/constructor parameter declares an call-by-name argument with =>, and in the culture we don't typically use this unless there's a special reason.

Chris Martin
  • 30,334
  • 10
  • 78
  • 137
  • I'm not sure about `by assigning left.tail to a var`. Just saying `left.tail` suffices to invoke it; assignment is, as they say, orthogonal? Maybe you mean passing the expr to a method by-name would not eval it. `in the culture` is a very interesting phrase. – som-snytt Jul 18 '16 at 00:19
  • 1
    I mean any context other than into a by-name parameter forces it, and the RHS of an assignment is one such context. Not to imply that assignment is special. To say "Just saying `left.tail` suffices to invoke it" may be misleading, because it does depend on the context in which you say it. – Chris Martin Jul 18 '16 at 00:21
1

Let me offer another answer to this, one that just looks from a high level, i.e. without actually considering the code.

If you want to know how long a Stream is, you must evaluate it all the way to the end. Otherwise, you can only guess at its length. Admittedly, you may not actually care about the values (since you only want to count them) but that's immaterial.

On the other hand, when you "take" a certain number of elements from a stream (or indeed any collection) you are simply saying that you want at most that number of elements. The result is still a stream even though it may have been truncated.

Phasmid
  • 923
  • 7
  • 19
  • Is that true? I can imagine a Stream that knows it produces ten elements, but calculates them lazily because it takes a day to run each calculation. From a high level, not looking at code, I don't understand the "must" evaluate for length. – som-snytt Jul 18 '16 at 05:07
  • 1
    @som-snytt I too can imagine such a data structure that knows about the ten lazily-evaluated elements. But it wouldn't be Stream because a Stream _cannot_ know anything about its tail until it is forced to be evaluated. – Phasmid Jul 19 '16 at 01:30