Why stream fold operation throws Out of memory exception?

Question

I have following simple code

 def fib(i:Long,j:Long):Stream[Long] = i #:: fib(j, i+j)
 (0l /: fib(1,1).take(10000000)) (_+_)

And it throws OutOfMemmoryError exception. I can not understand why, because I think all the parts use constant memmory i.e. lazy evaluation streams and foldLeft...

Those code also don't work

fib(1,1).take(10000000).sum or max, min e.t.c.

How to correctly implement infinite streams and do iterative operations upon it?

Scala version: 2.9.0

Also scala javadoc said, that foldLeft operation is memmory safe for streams

  /** Stream specialization of foldLeft which allows GC to collect
   *  along the way.
   */
  @tailrec
  override final def foldLeft[B](z: B)(op: (B, A) => B): B = {
    if (this.isEmpty) z
    else tail.foldLeft(op(z, head))(op)
  }

EDIT:

Implementation with iterators still not useful, since it throws ${domainName} exception

   def fib(i:Long,j:Long): Iterator[Long] = Iterator(i) ++  fib(j, i + j)

How to define correctly infinite stream/iterator in Scala?

EDIT2: I don't care about int overflow, I just want to understand how to create infinite stream/iterator etc in scala without side effects .

not sure, but might be useful: http://stackoverflow.com/questions/4132924/functional-processing-of-scala-streams-without-outofmemory-errors/4136670 — axel22, Sep 05 '11 at 14:30

Rex Kerr · Accepted Answer · 2011-09-05T15:03:22.943

7

The reason to use Stream instead of Iterator is so that you don't have to calculate all the small terms in the series over again. But this means that you need to store ten million stream nodes. These are pretty large, unfortunately, so that could be enough to overflow the default memory. The only realistic way to overcome this is to start with more memory (e.g. scala -J-Xmx2G). (Also, note that you're going to overflow Long by an enormous margin; the Fibonacci series increases pretty quickly.)

P.S. The iterator implementation I have in mind is completely different; you don't build it out of concatenated singleton Iterators:

def fib(i: Long, j: Long) = Iterator.iterate((i,j)){ case (a,b) => (b,a+b) }.map(_._1)

Now when you fold, past results can be discarded.

edited Sep 05 '11 at 15:03

answered Sep 05 '11 at 09:32

Rex Kerr

166,841
26
322
407

Fib actually has a very nice approach that *doesn't* use memoization (memoization is often used for trying to speed up the naive `O(fib n)` approach) and is TCO... there as some ideas [on this Fib-Haskell page](http://www.haskell.org/haskellwiki/The_Fibonacci_sequence) – Sep 05 '11 at 09:35
1

But why I scala stores all intermiadiate stream nodes, it have to be garbage collected... Stream are useful for indefinite collections as scala book describe them... so I don't get your point. – yura Sep 05 '11 at 10:05
@yura it sounds like you're confusing Stream and Iterator. Stream is just a lazy List so of course all the calculated elements are retained. – Luigi Plinge Sep 05 '11 at 11:34
2

@Luigi: That's not as obvious as you make it out to be. In Haskell the given code would run in constant space because each node of the list would be garbage collected as soon as the fold is done with that node (the nodes would only be retained if you assigned the list to a variable before folding over it and then kept using the list afterwards). – sepp2k Sep 05 '11 at 11:42
iterator impl throw StackOverflow. so may be any ideas how to define infinite stream in Scala? – yura Sep 05 '11 at 12:09
@yura Rex's implementation above is correct and doesn't give out of memory errors. You should accept this answer IMO. On the other hand, it's 33 times slower than the recursive solution, so take your pick if you're implementing something like this for real. – Luigi Plinge Sep 07 '11 at 16:29

anrizal - Anwar Rizal · Answer 2 · 2011-09-05T13:23:30.173

The OutOfMemoryError happens indenpendently from the fact that you use Stream. As Rex Kerr mentioned above, Stream -- unlike Iterator -- stores everything in memory. The difference with List is that the elements of Stream are calculated lazily, but once you reach 10000000, there will be 10000000 elements, just like List.

Try with new Array[Int](10000000), you will have the same problem.

To calculate the fibonacci number as above you may want to use different approach. You can take into account the fact that you only need to have two numbers, instead of the whole fibonacci numbers discovered so far.

For example:

scala> def fib(i:Long,j:Long): Iterator[Long] = Iterator(i) ++  fib(j, i + j)
fib: (i: Long,j: Long)Iterator[Long]

And to get, for example, the index of the first fibonacci number exceeding 1000000:

scala> fib(1, 1).indexWhere(_ > 1000000)
res12: Int = 30

Edit: I added the following lines to cope with the StackOverflow

If you really want to work with 1 millionth fibonacci number, the iterator definition above will not work either for StackOverflowError. The following is the best I have in mind at the moment:

  class FibIterator extends Iterator[BigDecimal] {
       var i: BigDecimal = 1
       var j: BigDecimal = 1
       def next = {val temp = i 
                   i = i + j
                   j = temp  
                   j }
       def hasNext = true
    }
scala> new FibIterator().take(1000000).foldLeft(0:BigDecimal)(_ + _)
res49: BigDecimal = 82742358764415552005488531917024390424162251704439978804028473661823057748584031
0652444660067860068576582339667553466723534958196114093963106431270812950808725232290398073106383520
9370070837993419439389400053162345760603732435980206131237515815087375786729469542122086546698588361
1918333940290120089979292470743729680266332315132001038214604422938050077278662240891771323175496710
6543809955073045938575199742538064756142664237279428808177636434609546136862690895665103636058513818
5599492335097606599062280930533577747023889877591518250849190138449610994983754112730003192861138966
1418736269315695488126272680440194742866966916767696600932919528743675517065891097024715258730309025
7920682881137637647091134870921415447854373518256370737719553266719856028732647721347048627996967...

With iterator I have Stackoverflow exception, so they still not useful for defining infinite stream that way... — yura, Sep 05 '11 at 12:07
What operation leads to Stack overflow ? foldLeft(0)(_ + _) ? Note that, 1000000th fibonacci is a big number, the sum of all will be even bigger. Anyway, I edited my answer for StackOverflow — anrizal - Anwar Rizal, Sep 05 '11 at 13:24
Using Iterators (at least in the manner suggested) is a very bad solution to this problem. It has to re-compute every previous state every time so it's a bit like a Stream without memoization, and complexity will be something like O(e^n). For an infinite stream this is not good! Note the demo only computes 30 elements. Don't try summing 10000 elements (I did, and had to kill my scala session after running for several minutes). — Luigi Plinge, Sep 05 '11 at 13:30
@anrizal Thanks, but now it is very ugly, non functional code... What I'm start worrying about now that scala is not very good for functional programming. — yura, Sep 05 '11 at 14:14

Keith Pinson · Answer 3 · 2012-02-29T17:05:53.527

@yura's problem:

def fib(i:Long,j:Long):Stream[Long] = i #:: fib(j, i+j)
(0l /: fib(1,1).take(10000000)) (_+_)

besides using a Long which can't possibly hold the Fibonacci of 10,000,000, it does work. That is, if the foldLeft is written as:

fib(1,1).take(10000000).foldLeft(0L)(_+_)

Looking at the Streams.scala source, foldLeft() is clearly designed for Garbage Collection, but /: is not def'd.

The other answers alluded to another problem. The Fibonacci of 10 million is a big number and if BigInt is used, instead of just overflowing like with a Long, absolutely enormous numbers are being added to each over and over again.

Since Stream.foldLeft is optimized for GC it does look like the way to solve for really big Fibonacci numbers, rather than using a zip or tail recursion.

// Fibonacci using BigInt
def fib(i:BigInt,j:BigInt):Stream[BigInt] = i #:: fib(j, i+j)
fib(1,0).take(10000000).foldLeft(BigInt("0"))(_+_)

Results of the above code: 10,000,000 is a 8-figure number. How many figures in fib(10000000)? 2,089,877

Didier Dupont · Answer 4 · 2011-09-05T14:41:41.337

1

fib(1,1).take(10000000) is the "this" of the method /:, it is likely that the JVM will consider the reference alive as long as the method runs, even if in this case, it might get rid of it. So you keep a reference on the head of the stream all along, hence on the whole stream as you build it to 10M elements.

edited Sep 05 '11 at 14:41

answered Sep 05 '11 at 09:38

Didier Dupont

29,398
7
71
90

Hm. Shouldn't it be the case, that the element returned by `take(1000000)` is the object, on which `/:` is called? – Dirk Sep 05 '11 at 09:43
You're right, the take result is the target, but it amounts to the same thing, reference to the full stream, actual computation caused by fold. – Didier Dupont Sep 05 '11 at 12:31
Ouch! Just noticed, that I confused `take` and `drop`. Consider my comment withdrawn. – Dirk Sep 05 '11 at 14:34
Well maybe some unwritten consequences were false, but everything in the comment was right. I'll fix my answer. – Didier Dupont Sep 05 '11 at 14:41

score 1 · Answer 5 · answered Sep 05 '11 at 17:13

You could just use recursion, which is about as simple:

def fibSum(terms: Int, i: Long = 1, j: Long = 1, total: Long = 2): Long = {
  if (terms == 2) total
  else fibSum(terms - 1, j, i + j, total + i + j)
}

With this, you can "fold" a billion elements in only a couple of seconds, but as Rex points out, summing the Fibbonaci sequence overflows Long very quickly.

If you really wanted to know the answer to your original problem and don't mind sacrificing some accuracy you could do this:

def fibSum(terms: Int, i: Double = 1, j: Double = 1, tot: Double = 2,
                                                     exp: Int = 0): String = {
  if (terms == 2) "%.6f".format(tot) + " E+" + exp
  else {
    val (i1, j1, tot1, exp1) =
      if (tot + i + j > 10) (i/10, j/10, tot/10, exp + 1) 
      else                  (i, j, tot, exp)
    fibSum(terms - 1, j1, i1 + j1, tot1 + i1 + j1, exp1)
  }
}

scala> fibSum(10000000)
res54: String = 2.957945 E+2089876

I'm not interesting in result, fibanchy numbers is just example of infinte stream and fold operation over it. I like collection syntax more then recursive definition since I can get all nice collection methods... Other examples of stream: All files on disk, iterator over several gigabite database etc. — yura, Sep 05 '11 at 18:28
I know you're not interested in the result, but my point is that Stream / Iterator with fold operations might not be the best approach, depending on your problem. Recursion is often cleaner: see for example Odersky's answer here: http://stackoverflow.com/questions/7293617/split-up-a-list-at-each-element-satisfying-a-predicate-scala — Luigi Plinge, Sep 07 '11 at 16:27

Why stream fold operation throws Out of memory exception?

5 Answers5