Why is this Haskell expression so slow?

Question

I'm working on Project Euler Problem 14. Here's my solution.

import Data.List

collatzLength :: Int->Int
collatzLength 1 = 1
collatzLength n | odd n = 1 + collatzLength (3 * n + 1)
                | even n = 1 + collatzLength (n `quot` 2)

maxTuple :: (Int, Int)->(Int, Int)->Ordering
maxTuple (x1, x2) (y1, y2)  | x1 > y1 = GT
                | x1 < y1 = LT
                | otherwise = EQ

I'm running the following out of GHCi

maximumBy maxTuple [(collatzLength x, x) | x <- [1..1000000]]

I know that if Haskell evaluated strictly, the time on this would be something like O(n³). Since Haskell evaluates lazily though, it seems like this should be some constant multiple of n. This has been running for nearly an hour now. Seems very unreasonable. Does anyone have any idea why?

I don't see why Haskell's laziness should make any difference to the complexity of this algorithm. — sepp2k, Aug 20 '11 at 21:32
Your function `collatzLength` is not tail-recursive. This slows down the function and causes unneeded allocation. And your `maxTuple` is the same as `comparing fst`. — fuz, Aug 20 '11 at 21:32
@sepp I don't really know how List Comprehensions are implemented. If they're using Map, if Haskell evaluated strictly, it seems like it would have to pass through the list multiple times. — afkbowflexin, Aug 20 '11 at 21:34
@Josh That could indeed be a reason. IIRC this was an issue in other programs too, although I can't recall where. — fuz, Aug 20 '11 at 21:35
Yeah, I didn't really think too hard about the evaluation. I rewrote collatzLength to be tail recursive and it's still slow. Also, I didn't compile this. It didn't seem like I should have to. — afkbowflexin, Aug 20 '11 at 21:43
@Josh: Post your update as a new question. It's a linking problem that's not really related to the topic here. — hammar, Aug 21 '11 at 19:34
Hammar, I've removed the edit. The new post is here http://stackoverflow.com/questions/7140617/ghc-linking-problem — afkbowflexin, Aug 21 '11 at 19:38

hammar · Accepted Answer · 2011-08-21T21:49:23.407

You're assuming that the collatzLength function will be memoized. Haskell does not do automatic memoization. You'll need to do that yourself. Here's an example using the data-memocombinators package.

import Data.List
import Data.Ord
import qualified Data.MemoCombinators as Memo

collatzLength :: Integer -> Integer
collatzLength = Memo.arrayRange (1,1000000) collatzLength'
  where
    collatzLength' 1 = 1
    collatzLength' n | odd n  = 1 + collatzLength (3 * n + 1)
                     | even n = 1 + collatzLength (n `quot` 2)

main = print $ foldl1' max $ [(collatzLength n, n) | n <- [1..1000000]]

This runs in about 1 second when compiled with -O2.

score 1 · Answer 2 · answered Aug 20 '11 at 21:34

1

For being able to find a maximum of a list, the whole list needs to be evaluated.

So it will calculate collatzLength from 1 to 1000000 and collatzLength is recursive. The worst thing is, that your definition of collatzLength is even not tail-recursive.

answered Aug 20 '11 at 21:34

Johannes Weiss

52,533
16
102
136

This answer misses the point. The entire list *does* need to be evaluated, but by recording (memoizing) the results of `collatzLength`, it can be greatly sped up precisely *because* it is recursively defined. I think that's the nugget of wisdom that this Euler problem is supposed to convey. – Dan Burton Aug 21 '11 at 01:02
okay, but he asked about lazy evaluation. And I told that lazy evaluation will not make it faster here because everything needs to be evaluated (at least once) anyway. But you're right! The problem is not evaluating everything once, the problem is evaluation everything _more_ than once. – Johannes Weiss Aug 21 '11 at 09:28
Ah, I see where you were coming from now. You, too, are correct: the whole list needs to be evaluated, therefore laziness won't really help; the poster's confusion apparently came from not clearly understanding lazy evaluation, assuming it included memoization magic which it does not. – Dan Burton Aug 22 '11 at 04:22
yeah, that's it, I think :-). He perhaps mixed up purity and laziness? Pure functions COULD have auto-memoization... – Johannes Weiss Aug 23 '11 at 11:30

score 0 · Answer 3 · answered Aug 21 '11 at 12:23

0

cL is short for collatzLength

cL!!n stands for collatzLength n

cL :: [Int]
cL = 1 : 1 : [ 1 + (if odd n then cL!!(3*n+1) else cL!!(n `div` 2)) | n <- [2..]]

Simple test:

ghci> cL !! 13
10

answered Aug 21 '11 at 12:23

wenlong

1,434
1
13
22

Basically, this uses a list that memoizes `collatzLength` as it constructs itself. But it's rather mind-boggling when you try to trace the order of execution for `cL !! 3`, which depends on `cL !! 10`, a later element in the list which will get evaluated earlier! – Dan Burton Aug 22 '11 at 04:28
@Dan Burton, the worst is that it runs slowly, maybe cause by (!!). – wenlong Aug 22 '11 at 14:11
Yes, `!!` adds a portion of *O(n)* time complexity into the heart of the algorithm (in contrast, a memoized function should be *O(1)* to invoke or *O(log n)* at worst for the lookup), so for large `n` you will definitely see a big slowdown. – Dan Burton Aug 22 '11 at 15:30

Why is this Haskell expression so slow?

3 Answers3

Linked