9

I've compiled this program and am trying to run it.

import Data.List
import Data.Ord
import qualified Data.MemoCombinators as Memo

collatzLength :: Int -> Int
collatzLength = Memo.arrayRange (1, 1000000) collatzLength'
  where
    collatzLength' 1 = 1
    collatzLength' n | odd n  = 1 + collatzLength (3 * n + 1)
                     | even n = 1 + collatzLength (n `quot` 2)

main = print $ maximumBy (comparing fst) $ [(collatzLength n, n) | n <- [1..1000000]]

I'm getting the following from GHC

Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.

I assume this is one of the "space overflow" things I've been hearing about. (I'm pretty new to Haskell.) How do I fix it? Do I have to rewrite collatzLength to be tail recursive?

afkbowflexin
  • 4,029
  • 2
  • 37
  • 61
  • How are running this? It works for me when compiled with `ghc -O`, using ghc-7.0.4. – John L Aug 21 '11 at 20:19
  • @yairchu: I disagree; none of the answers are really great but Henning's is accurate. The code as given has different behavior depending on the bit size; the stack overflow appears with 32-bit ints (either from 32-bit ghc or explicitly using `Int32`). – John L Aug 21 '11 at 21:32
  • @John L: got it, I misunderstood. In that case, hammar's answer is the comprehensive one and should be marked. – yairchu Aug 21 '11 at 21:41
  • Which optimizations, specifically, are happening with -O or -O2 that magically solve this problem? – Dan Burton Aug 22 '11 at 04:31

4 Answers4

11

As the author of the code in question, I am now slightly embarrassed because it has not one but two possible stack overflow bugs.

  1. It uses Int. On a 32-bit system this will overflow, as the Collatz sequence can go quite a bit higher than the starting number. This overflow can cause infinite recursion as the function jumps back and forth between negative and positive values.

    In the case of numbers between one and a million, the worst starting point is 704511, which goes as high as 56,991,483,520 before coming back down towards 1. This is well outside the 32-bit range.

  2. It uses maximumBy. This function is not strict, so it will cause a stack overflow when used on long lists. One million elements is more than enough for this to happen with the default stack size. It still works with optimizations enabled, though, due to the strictness analysis performed by GHC.

    The solution is to use a strict version. Since none is available in the standard libraries, we can use the strict left fold ourselves.

Here is an updated version which should (hopefully) be stack overflow-free, even without optimizations.

import Data.List
import Data.Ord
import qualified Data.MemoCombinators as Memo

collatzLength :: Integer -> Integer
collatzLength = Memo.arrayRange (1,1000000) collatzLength'
  where
    collatzLength' 1 = 1
    collatzLength' n | odd n  = 1 + collatzLength (3 * n + 1)
                     | even n = 1 + collatzLength (n `quot` 2)

main = print $ foldl1' max $ [(collatzLength n, n) | n <- [1..1000000]]
hammar
  • 138,522
  • 17
  • 304
  • 385
7

Here's a shorter program that fails in the same way:

main = print (maximum [0..1000000])

Yep.

$ ghc --make harmless.hs && ./harmless
[1 of 1] Compiling Main             ( harmless.hs, harmless.o )
Linking harmless ...
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.

With -O2 it works. What do I make of it? I don't know :( These space mysteries are a serious gotcha.

Edit:

Thx to hammar for pointing out the culprit.

Changing your program to use

maximum' = foldl1' max

Makes it work without -O2. The implementation of Prelude's maximum is lazy and so doesn't quite work for long lists without compiler magic dust.

yairchu
  • 23,680
  • 7
  • 69
  • 109
  • 2
    Indeed, `maximum` and `maximumBy` are both non-strict, so without the strictness analysis this will cause stack overflows. Good catch! I got a manual version using the strict `foldl1'` to work without optimizations. – hammar Aug 21 '11 at 21:00
  • 1
    @hammar, that makes sense. Write it up as an answer! – hmakholm left over Monica Aug 21 '11 at 21:10
6

I think it's most likely that you're hitting integer overflow with some of the Collatz sequences, and then ending up in an "artificial" cycle that contains overflows but never hits 1. That would produce an infinite recursion.

Remember that some Collatz sequences get very much larger than their starting number before they finally (?) end up at 1.

Try to see if it fixes your problem to use Integer instead of Int.

hmakholm left over Monica
  • 23,074
  • 3
  • 51
  • 73
  • This isn't it (according to theory and a test). There's just no tail recursion and no intelligent optimization so he's making a really deep stack. – Thomas M. DuBuisson Aug 21 '11 at 20:26
  • @Thomas, That was my first guess too, but [according to Wikipedia](http://en.wikipedia.org/wiki/Collatz_conjecture#Examples), there's no starting point less than a billion that requires more than 1000 steps to reach 1. And a recursion depth of 1000 shouldn't produce an 8 megabyte stack even with the least optimized execution strategy. (Is it the final million-element list comprehension that fails to stay in constant space without optimization?) – hmakholm left over Monica Aug 21 '11 at 20:31
  • Hummm, not sure then. I read the answer swapped (that's FYI if you saw my other comment before I deleted it), so I see the maximum here is a depth of 525. Still, the swap is growing over 70M before completion if you don't use optimization. – Thomas M. DuBuisson Aug 21 '11 at 20:39
  • This seems to be correct. Changing `Int` (64-bit on my system) to `Int32`, I can reproduce the stack overflow. A small test program reveals that the Collatz sequence for 704511 goes as high as 56991483520, which is well out of 32-bit range. – hammar Aug 21 '11 at 20:42
  • It works fine when compiling with `-O2`, so probably not an infinite loop. – yairchu Aug 21 '11 at 20:43
  • @hammer On my 64bit machine even `Int` will result in a stack space overflow. They don't for you? Did you use the same optimization flags? EDIT: ahh, but even with optimization it overflows. I see. – Thomas M. DuBuisson Aug 21 '11 at 20:43
  • @Thomas: I though I had tried that. Seems I had not. Testing some more I find that the 64-bit version fails without `-O2`. The 32-bit version always fails. So both answers may be right. This is using GHC 7.0.2. – hammar Aug 21 '11 at 20:49
4

Use the optimizer (via the -O2 flag) any time you are concerned about performance. GHC's optimizations are hugely important not just to run time but to stack use. I've tested this with GHC 7.2 and optimization takes care of your issue.

EDIT: In addtion, if you're on a 32 bit machine be sure to use Int64 or Word64. You'll overflow the size of a 32 bit int and cause non-termination otherwise (thanks to Henning for this, upvote his answer).

Thomas M. DuBuisson
  • 64,245
  • 7
  • 109
  • 166
  • But how would even a completely naive reduction strategy lead to an overflow here? Is there something inside the memoizer that requires optimization to work? – hmakholm left over Monica Aug 21 '11 at 20:27
  • I had used the -O2 flag. I should've specified that in my post. It turns out, I needed to use the "Integer" type. I forgot that even though my system is 64-bit, the operating system I'm currently on is 32-bit. – afkbowflexin Aug 21 '11 at 20:55