2

I'm doing some of the Project Euler projects (not as homework, just for fun/learning), and I'm learning Haskell. One of the problems was to find the largest Collatz Sequence with the starting number under 1 million (http://projecteuler.net/problem=14)

So anyhow, I was able to do it, and my algorithm works and gets the right answer rather rapidly when compiled. However, it uses a 1000000 depth recursion.

So my question is: did I do this right? As is, is the the proper Haskell way to do it? How could I make it faster? Also, with memory usage, how is the recursion actually implemented on the low-level? As in how does it use memory?

(SPOILER ALERT: Don't look at this if you want to solve Project Euler's problem #14 on your own without looking at the answer.)

--haskell script --problem: find the longest collatz chain for a number less than 2 million.

collatzLength x| x == 1 = 1
               | otherwise = 1 + collatzLength(nextStep x)


longestChain (num, numLength) bound counter
           | counter >= bound = (num, numLength)
           | otherwise = longestChain (longerOf (num,numLength)
             (counter,   (collatzLength counter)) ) bound (counter + 1)
           --I know this is a messy function, but I was doing this problem just 
           --for myself, so I didn't bother making some utility functions for it.
           --also, I split the big line in half to display on here nicer, would
           --it actually run with this line split?


longerOf (a1,a2) (b1,b2)| a2 > b2 = (a1,a2)
                        | otherwise = (b1,b2)

nextStep n | mod n 2 == 0 = (n `div` 2)
           | otherwise = 3*n + 1

main = print (longestChain (0,0) 1000000 1)

The program runs in about 7.5 seconds when compiled with -O2.

So, any advice/suggestions? I want to try to get the program to run faster with less memory usage, and I want to do it in a very Haskellian (that should be a word) way.

Thanks in advance!

Nathan
  • 73,987
  • 14
  • 40
  • 69
  • I would try making `collatzLength` iterative (tail recursive). Right now you build the result using recursion `1 + (1 + (1 + ...))`. Apply the same tail recursion technique that you have for `longestChain`. – akonsu Jun 17 '13 at 19:41
  • http://hpaste.org/90090 – Will Ness Jun 18 '13 at 13:14

1 Answers1

8

Edit to Answer Questions

did I do this right?

Almost, as the comments said, you build a big thunk of 1+(1+(1+...)) - use a strict accumulator instead or a higher-level function that handles things for you. There are other minor things like defining a function to compare the second element instead of using maximumBy (comparing snd) but that's more stylistic.

As is, is the the proper Haskell way to do it?

It is acceptably idiomatic Haskell code.

How could I make it faster?

See my below benchmarks. The extremely common answers for Euler performance questions are:

  • Use -O2 (as you did)
  • Try -fllvm (GHC NCG is sub-optimal)
  • Use worker/wrappers to reduce arguments or, in your case, leverage an accumulator.
  • Use fast/unboxable types (Int when you can instead Integer, Int64 if needed for portability, etc).
  • Use rem instead of mod when all the values are positive. For your case it is also useful to know or discover that div tends to compile to something that is slower than quot.

Also, with memory usage, how is the recursion actually implemented on the low-level? As in how does it use memory?

Both these questions are very broad. Complete answers would likey need to address lazy evaluation, tail call optimization, worker transformations, garbage collection, etc. I suggest you explore these answers in more depth over time (or hope someone here makes the complete answer I'm avoiding).

Original Post - Benchmark numbers

Original:

$ ghc -O2 so.hs ; time ./so
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
(837799,525)

real    0m5.971s
user    0m5.940s
sys 0m0.019s

Use a worker function with an accumulator for collatzLength:

$ ghc -O2 so.hs ; time ./so
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
(837799,525)

real    0m5.617s
user    0m5.590s
sys 0m0.012s

Use Int and not defaulting to Integer - it's also easier to read with type signatures!

$ ghc -O2 so.hs ; time ./so
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
(837799,525)

real    0m2.937s
user    0m2.932s
sys 0m0.001s

Use rem and not mod:

$ ghc -O2 so.hs ; time ./so
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
(837799,525)

real    0m2.436s
user    0m2.431s
sys 0m0.001s

Use quotRem and not rem then div:

$ ghc -O2 so.hs ; time ./so
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
(837799,525)

real    0m1.672s
user    0m1.669s
sys 0m0.002s

This is all very much like a previous question: Speed comparison with Project Euler: C vs Python vs Erlang vs Haskell

EDIT: and yes, as Daniel Fischer suggests, using bit ops of .&. and shiftR improves on quotRem:

$ ghc -O2 so.hs ; time ./so
(837799,525)

real    0m0.314s
user    0m0.312s
sys 0m0.001s

Or you can just use LLVM and let it do it's magic (NB this version uses quotRem still)

$ time ./so
(837799,525)

real    0m0.286s
user    0m0.283s
sys 0m0.002s

LLVM actually does well, so long as you avoid the hideousness that is mod, and optimizes the guard-based code using either rem or even equally well as the hand-optimized .&. with shiftR.

For a result that is around 20x faster than the original.

EDIT: People are surprised that quotRem performs as well as bit manipulation in the face of Int. The code is included, but I'm not clear on the surprise: just because something is possibly negative doesn't mean you can't handle it with very similar bit manipulations that could be of identical cost on the right hardware. All three versions of nextStep seem to be performing identically (ghc -O2 -fforce-recomp -fllvm, ghc version 7.6.3, LLVM 3.3, x86-64).

{-# LANGUAGE BangPatterns, UnboxedTuples #-}

import Data.Bits

collatzLength :: Int -> Int
collatzLength x| x == 1    = 1
               | otherwise = go x 0
 where
    go 1 a  = a + 1
    go x !a = go (nextStep x) (a+1)


longestChain :: (Int, Int) -> Int -> Int -> (Int,Int)
longestChain (num, numLength) bound !counter
   | counter >= bound = (num, numLength)
   | otherwise = longestChain (longerOf (num,numLength) (counter, collatzLength counter)) bound (counter + 1)
           --I know this is a messy function, but I was doing this problem just 
           --for myself, so I didn't bother making some utility functions for it.
           --also, I split the big line in half to display on here nicer, would
           --it actually run with this line split?


longerOf :: (Int,Int) -> (Int,Int) -> (Int,Int)
longerOf (a1,a2) (b1,b2)| a2 > b2 = (a1,a2)
                        | otherwise = (b1,b2)
{-# INLINE longerOf #-}

nextStep :: Int -> Int
-- Version 'bits'
nextStep n = if 0 == n .&. 1 then n `shiftR` 1 else 3*n+1
-- Version 'quotRem'
-- nextStep n = let (q,r) = quotRem n 2 in if r == 0 then q else 3*n+1
-- Version 'almost the original'
-- nextStep n | even n = quot n 2
--            | otherwise  = 3*n + 1
{-# INLINE nextStep #-}


main = print (longestChain (0,0) 1000000 1)
Community
  • 1
  • 1
Thomas M. DuBuisson
  • 64,245
  • 7
  • 109
  • 166
  • 1
    Now use `n .&. 1 == 0` and ``n `shiftR` 1`` instead of `quotRem`. – Daniel Fischer Jun 17 '13 at 20:00
  • @DanielFischer I read many times that a decent compiler should do such an optimization itself. Do you know how it is with GHC (perhaps with the LLVM backend)? Does using explicit bit shift operations really help? – Petr Jun 17 '13 at 20:29
  • 2
    @PetrPudlák GHC has not yet been taught many of such optimisations, too few people working on it, their time is spent with higher-level optimisations and type system trickery, it does not not do that by itself (so far). The LLVM backend does that optimisation when you use the right type (and last I checked, only for `quot/rem`, not for `div/mod`). But of course, with `Int`, it has to take into account the possibility of negative numbers, so you can beat it (not by much) using the domain-knowledge that only positive numbers occur [unless `Int` is 32 bits, in which case you have overflow]. – Daniel Fischer Jun 17 '13 at 20:36
  • 1
    At least on my box, even with the LLVM backend, simply shifting beats the LLVM optimisation (by a surprisingly large margin, ~25%), since LLVM doesn't know that no negative numbers occur. The difference disappears when using `Word`. – Daniel Fischer Jun 17 '13 at 21:11
  • @Thomas How bit operations with LLVM compares with `quotRem` with LLVM? Can you add that timing info too. – Satvik Jun 17 '13 at 21:40
  • 1
    @Satvik they are indistinguishable. I assume LLVM recognizes the assembly equivalent of `quotRem x 2` and rewrites it. – Thomas M. DuBuisson Jun 17 '13 at 23:13
  • Thomas, indistinguishable with `Int`? That would surprise me a little (not as much as the large difference I got here), since it would need to cater for negative dividends. For `Word`, it is to be expected to be identical, since then the division is a single shift. – Daniel Fischer Jun 17 '13 at 23:46
  • @ThomasM.DuBuisson My surprise is because a) To account for negative numbers, you need two shifts and one add, while for non-negative numbers, you're golden with just one shift. Well, I would expect that to make a small difference, if it is at all measurable, except that b) I got a ~25% difference between just masking and shifting by hand and `quotRem` or `even` + `quot` with the LLVM backend. (And I just checked, with the manual shift-add-shift signed division by 2, I get pretty much the same time as LLVM + `quot(Rem)`, so the difference is due to the additional add and shift, at least ... – Daniel Fischer Jun 18 '13 at 12:35
  • ... almost all of it.) So I'm a bit surprised that something that makes a (surprisingly) large difference on my box makes apparently none on yours. Could be the different CPUs, could be different compilers. (Of course it's possible that your LLVM notices that the division is only done on even numbers, so the plain single shift suffices also for negative numbers. But that would also surprise me a little, since mine [3.2] doesn't.) – Daniel Fischer Jun 18 '13 at 12:40
  • @WillNess On a 32-bit system, _this particular problem_ would lead to overflow that leads to infinite negative cycles when using `Int`. As it so happens, for this particular problem, the overflows that happen when using `Word` on a 32-bit system wouldn't change the computed answer, so one could cheat and use `Word` for speed on a 32-bit system. But if one doesn't want to cheat on a 32-bit system, one needs `Int64`/`Word64` or `Integer`. – Daniel Fischer Jun 18 '13 at 12:49