5

I've been a bit surprised by GHC throwing stack overflows if I'd need to get value of large list containing memory intensive elements. I did expected GHC has TCO so I'll never meet such situations.

To most simplify the case look at the following straightforward implementations of functions returning Fibonacci numbers (taken from HaskellWiki). The goal is to display millionth number.

import Data.List

# elegant recursive definition
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

# a bit tricky using unfoldr from Data.List
fibs' = unfoldr (\(a,b) -> Just (a,(b,a+b))) (0,1)

# version using iterate
fibs'' = map fst $ iterate (\(a,b) -> (b,a+b)) (0,1)

# calculate number by definition
fib_at 0 = 0
fib_at 1 = 1
fib_at n = fib_at (n-1) + fib_at (n-2)

main = do
    {-- All following expressions abort with
        Stack space overflow: current size 8388608 bytes.
        Use `+RTS -Ksize -RTS' to increase it.
     --}
    print $ fibs !! (10^6)
    print . last $ take (10^6) fibs
    print $ fibs' !! (10^6)
    print $ fibs'' !! (10^6)

    -- following expression does not finish after several 
    -- minutes
    print $ fib_at (10^6)

The source is compiled with ghc -O2.

What am I doing wrong ? I'd like to avoid recompiling with increased stack size or other specific compiler options.

Raeez
  • 2,423
  • 15
  • 12
David Unric
  • 7,421
  • 1
  • 37
  • 65
  • @David: read this for clarifying about TCO in haskell: http://www.haskell.org/pipermail/haskell-cafe/2009-March/058607.html – Francesco Jun 23 '11 at 15:03
  • `fibs !! (10^6)` ...that is a very large number. – C. A. McCann Jun 23 '11 at 15:15
  • Each of the three lookups on its own should work in almost constant heap space - that you mention `fibs` twice will cause it to be retained, however. And your version of `fib_at` has exponential complexity. No amount of tail call optimization will help you there... – Peter Wortmann Jun 23 '11 at 15:18
  • @camccann> Indeed. 120719474515914100282640417445683502254279809033418884477131589123371500059867801 ... 6074509262836314172778883659535102587911419872025962380687123573280832406684390626 (208988 digits). Got with functional-like Ruby code: Enumerator.new { |y| a,b=0,1; loop {y.yield a; a,b=b,a+b} }.with_index {|elt,ix| if ix == 10**6-1 then print elt; break; end} – David Unric Jun 23 '11 at 16:05
  • @David: I get a different result, 1953282128707757731632.... which appears consistent with wolfram alpha. Try my code and let me know. – Francesco Jun 23 '11 at 16:15
  • @David, Why do you care? Just increase the stack space limit above the roof. The limit is only there to prevent some of the infinite-loop programs from eating up all of the computer memory so that they fail earlier. You don't need to add any compiler option either. It's a runtime option: `+RTS -K1000m`. In Haskell one doesn't easily distinguish between stack space and heap, so I see no reason to limit one and prefer the other. – Rotsor Jun 25 '11 at 02:24

3 Answers3

9

These links here will give you a good introduction to your problem of too many thunks (space leaks).

If you know what to look out for (and have a decent model of lazy evaluation), then solving them is quite easy, for example:

{-# LANGUAGE BangPatterns #-}                        

import Data.List                                     

fibs' = unfoldr (\(!a,!b) -> Just (a,(b,a+b))) (0,1) 

main = do                                            
    print $ fibs' !! (10^6)  -- no more stack overflow                   
Raeez
  • 2,423
  • 15
  • 12
  • Thank you. My fault I've believed programmer in Haskell doesn't need to care for such things like stack limits and the like and tweak the code like forced evaluation. Btw. any idea how to use bang-pattern for the first function "fibs = 0 : 1 : zipWith (+) fibs (tail fibs)" to be effective ? – David Unric Jun 23 '11 at 17:48
  • Yep, strict version of zipWith. Thanks Mikhail ! – David Unric Jun 23 '11 at 19:56
3

All of the definitions (except the useless fib_at) will delay all the + operations, which means that when you have selected the millionth element it is a thunk with a million delayed additions. You should try something stricter.

augustss
  • 22,884
  • 5
  • 56
  • 93
1

As other have pointed out, Haskell being lazy you have to force evaluation of the thunks to avoid stack overflow. It appears to me that this version of fibs' should work up to 10^6:

fibs' = unfoldr (\(a,b) -> Just (seq a (a, (b, a + b) )))  (0,1)

I recommend to study this wiki page on Folds and have a look at the seq function.

Francesco
  • 3,200
  • 1
  • 34
  • 46