2

I'm working on problem 14 of Project Euler (http://projecteuler.net/problem=14). I'm trying to use memoization so that I save the length of the sequence for a given number as a partial result. I'm using Data.MemoCombinators for that. The program below produces a stack overflow.

import qualified Data.MemoCombinators as Memo

sL n = seqLength n 1
seqLength = Memo.integral seqLength'
  where seqLength' n sum = if (n == 1)     then sum
                           else if (odd n) then seqLength (3*n+1) (sum+1)
                           else                 seqLength (n `div` 2) (sum+1)

p14 = snd $ maximum $ zip (map sL numbers) numbers
  where numbers = [1..max]
        max     = 999999

The stack overflow should be due to sum+1 being evaluated lazily. How can I force it to be evaluated before each call to seqLength? BTW, is memoization well implemented? I'm more interested in pointing out my Haskell mistakes than in solving the exercise.

Zero Piraeus
  • 56,143
  • 27
  • 150
  • 160
rturrado
  • 7,699
  • 6
  • 42
  • 62
  • 2
    BTW, for this problem you will probably want to use the `arrayRange` combinator -- as the values in the sequence get very large, memo lookups become more expensive (log time for `integral`) and are very unlikely to be reused. Better to ignore memoization for sufficiently large values. – luqui Nov 07 '11 at 00:57

4 Answers4

7

The most common ways of forcing evaluation are to use seq, $! or a bang pattern. However sum+1 is not the culprit here. maximum is. Replacing it with the stricter foldl1' max fixes the stack overflow error.

That taken care of, it turns out that your memoization here is no good. Memo.integral only memoizes the first argument, so you're memoizing partial applications of seqLength', which doesn't really do anything useful. You should get much better results without tail recursion so that you're memoizing the actual results. Also, as luqui points out, arrayRange should be more efficient here:

seqLength = Memo.arrayRange (1, 1000000) seqLength'
  where seqLength' 1 = 1
        seqLength' n | odd n     = 1 + seqLength (3*n+1)
                     | otherwise = 1 + seqLength (n `div` 2)
hammar
  • 138,522
  • 17
  • 304
  • 385
  • @rturrado: It's defined in `Data.List`. For finding functions, I recommend using [Hoogle](http://www.haskell.org/hoogle/?hoogle=foldl1%27) – hammar Nov 07 '11 at 19:46
  • OK, giving this as a correct answer. All the code I've seen in the responses seems so dense to me. So much to learn in only a few lines of code. So much going on backstage. Still to understand memoization, Memo.arrayRange... BTW, I was defining `seqLength :: Int -> Int` and that was still producing a stack overflow, but I've already checked why this was happening in this other answer to the same question: [link](http://stackoverflow.com/questions/7140732/haskell-space-overflow) – rturrado Nov 07 '11 at 23:45
2

I'm not familiar with Data.MemoCombinators, so the generic advice is: try seqLength (3*n+1) $! (sum+1) (the same for even n, of course).

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
2

Why use MemoCombinators when we can exploit laziness? The trick is to do something like

seqLength x = lengths !! x - 1
   where lengths = map g [1..9999999]
         g n | odd n = 1 + seqLength (3 * n + 1)
             | otherwise = 1 + seqLength (n `div` 2)

which should work in a memoized way. [Adapted from the non-tail-recursive solution by @hammar]

Of course, then seqLength is O(n) for the memoized case so it suffers less performance. However, this is remediable! We simply take advantage of the fact that Data.Vector is streamed and has O(1) random access. The fromList and map will be done at the same time (as the map will simply produce thunks instead of the actual values because we are using a boxed vector). We also fallback on a non-memoized version since we can't possibly memoize every possible value.

import qualified Data.Vector as V

seqLength x | x < 10000000 = lengths V.! x - 1
            | odd x = 1 + seqLength (3 * n + 1)
            | otherwise = 1 + seqLength (n `div` 2)
   where lengths = V.map g $ V.fromList [1..99999999]
         g n | odd n = 1 + seqLength (3 * n + 1)
             | otherwise = 1 + seqLength (n `div` 2)

Which should be comparable or better to using MemoCombinators. Don't have haskell on this computer, but if you want to figure out which is better, there's a library called Criterion which is excellent for this sort of thing.

I think using Unboxed Vectors could actually give better performance. It would force everything at once when you evaluate one item (I think) but you need that anyway. Hence you could then just run a foldl' max to get a O(n) solution that should have less constant overhead.

alternative
  • 12,703
  • 5
  • 41
  • 41
  • Note that this will go out of bounds, as the Collatz sequence tends to go up quite a bit before coming back down towards 1. For example, `seqLength 999999` will try to evaluate `seqLength 2999998` which will fail since the vector/list is only 1000000 elements long. – hammar Nov 07 '11 at 13:22
  • @hammar Yes, I realized this after. Perhaps only memoize, say, 10 million, and if it goes higher than that don't use the memoized version? I've placed in a fallback non-memoized version that doesn't use guards. However, I still think the method is pretty applicable. – alternative Nov 07 '11 at 13:50
  • @monadic: However, now your code is equivalent to just replacing `Memo.integral` with `Memo.arrayRange (1, 1000000)` which does this only-memoize-if-within-bounds for you. The main difference is that it uses `array` instead of `vector`. – hammar Nov 07 '11 at 14:03
  • @hammar not too bad for shaving off a dependency :P – alternative Nov 07 '11 at 19:40
1

If memory serves, for this problem you don't need memoization at all. Just use foldl' and bang patterns:

snd $ foldl' (\a n-> let k=go n 1 in if fst a < .... 
  where go n !len | n==1 =  ....

Compile with -O2 -XBangPatterns . It's always better to run stadalone as running compiled code in ghci can introduce space leaks.

Will Ness
  • 70,110
  • 9
  • 98
  • 181