1

I'm currently working on the 58th problem from Project Euler.

While my solution runs fine when I set the limit to 8%, it goes beyond one minute after that.

This surprises me, because I use a library for the primes generation and what I do by myself looks linear to me, and fine.

Though obviously I've missed plenty of huge thunks, quadratic (or worse) functions, bottlenecks etc before, and it must be the same here again.

What I'd like is a chek of my solution to know if the code is wrong, or if it's the way I handle the problem that's stupid. In the latter case, I'd like not to be spoiled.

I consider my question not to be a code review one, because basically my code doesn't work for its purpose, but I might be wrong, in which case please transfer the question to the appropriate SE site.

I tried literate programming for this one, so I'll just dump my file to provide further explanations.

My .lhs (well, formatted for SO)

We don't handle the primes ourselves and we want help from the compiler.

{-# OPTIONS_GHC -Wall #-}

import Data.Numbers.Primes

First, we construct in diags the stream of the numbers that are on the diagonals. To do that, we notice that those numbers get incremented 4 times in a row by a certain number, and then again with this number + 2, etc.
replicate 4 [2, 4..] will give us a list of increments.

Then we just need to aggregate all that with a scanl (+) and there we have our list.

primesDiags :: [Int]
primesDiags = go diags primes
  where
    diags = scanl (+) 1 . concatMap (replicate 4) $ [2, 4..] :: [Integer]

Once we have this list, we map all the numbers to 0 if the number is composite and 1 if the number is prime. To do that efficiently, we use a library to provide the stream of primes and map the two lists by running through them once only.

    go dss@(d:ds) pss@(p:ps) | d < p = 0:go ds pss
                             | d > p = go dss ps
                             | otherwise = 1:go ds ps

Then we tell ghc we know why our pattern matching is incomplete

    go _ _ = undefined -- we operate on streams

Now we have everything we need to answer the problem. Next step is to find at which square we cross the specific limit we seek to spot. To do that we simply update an accumulator to represent the number of primes we met up until this point and we keep track of the index of the square we're at too.

We start the recursion at 2 to just keep track of factorized behaviour. That's why we skip one item in primesDiags and since this item is 1 we set our acc to 0 (1 isn't prime).

nthSquare :: Int
nthSquare = go 2 (tail primesDiags) 0
  where
    go n (w:x:y:_:ds) primeN | 8 * primeN' < compositeN' = n
                             | otherwise = go (n + 1) ds primeN'
      where
        total = 4 * n - 3
        delta = sum [w, x, y]
        primeN' = primeN + delta
        compositeN' = total - primeN'
    go _ _ _ = undefined -- we operate on streams

Then, once we've spot the correct square, its side length is obtained by doubling its index and substracting one.

main :: IO ()
main = print $ nthSquare * 2 - 1

Here is a paste if you want to play with the code.

m09
  • 7,490
  • 3
  • 31
  • 58
  • one little nit: why is diags a `[Integer]` but primesDiag a `[Int]`? – ErikR Oct 20 '12 at 14:01
  • because primesDiag contains only `0`s and `1`s (could have used booleans but ints allow me to use `sum` instead of `length . filter id`) – m09 Oct 20 '12 at 14:10
  • @Mog I think the point is "Why isn't `diags` also an `[Int]`?" - that makes both versions a bit faster. – Daniel Fischer Oct 20 '12 at 14:11
  • 1
    Tangent: If you've formatted your code for SO, you can still get ghci to compile it which is [what I've been doing for a while](http://stackoverflow.com/questions/12666723/i-taught-ghci-to-compile-my-stackoverflow-posts-can-i-make-it-slicker) but when I get round to it, I'll make a drop-in replacement for unlit, and support for this SO's style of literate programming will be slicker. – AndrewC Oct 20 '12 at 14:26
  • Replacing `[Integer]` with `[Int]` in the definition of `diags` reduced the run time from 33 secs to 27 secs on my machine. – ErikR Oct 20 '12 at 14:29
  • @user5402 my rule of thumbs for the `Int`/`Integer` debate when doing some Euler problems is to always use `Integer` unless I know for sure that `Int` will not overflow for 32-bits systems, because I don't like a correct-only-if-executed-on-that-particular-system solution. Here I had no such knowledge. – m09 Oct 20 '12 at 14:51

1 Answers1

10

Not that the rest cannot be improved at all, but the library you used for prime generation isn't too fast. Using your code, I got

$./euler58 +RTS -s -RTS
11297
  30,958,460,200 bytes allocated in the heap
   4,671,021,104 bytes copied during GC
         495,832 bytes maximum residency (2822 sample(s))
          47,664 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     56551 colls,     0 par    5.96s    5.97s     0.0001s    0.0004s
  Gen  1      2822 colls,     0 par    1.60s    1.61s     0.0006s    0.0014s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   21.47s  ( 21.50s elapsed)
  GC      time    7.56s  (  7.57s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   29.04s  ( 29.08s elapsed)

  %GC     time      26.1%  (26.0% elapsed)

  Alloc rate    1,441,874,868 bytes per MUT second

  Productivity  73.9% of total user, 73.8% of total elapsed

Changing the import to

import Math.NumberTheory.Primes

(from the arithmoi package - disclaimer, I'm the author) and the first line of primesDiags to

primesDiags = go diags (map fromInteger primes)

the result is

$ ./aeuler58 +RTS -s -RTS
11297
   1,986,441,440 bytes allocated in the heap
      25,254,256 bytes copied during GC
         220,328 bytes maximum residency (34 sample(s))
         158,984 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      3761 colls,     0 par    0.06s    0.06s     0.0000s    0.0002s
  Gen  1        34 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.96s  (  0.96s elapsed)
  GC      time    0.06s  (  0.06s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    1.02s  (  1.02s elapsed)

  %GC     time       5.9%  (5.9% elapsed)

  Alloc rate    2,064,058,991 bytes per MUT second

  Productivity  94.1% of total user, 94.1% of total elapsed

which shows that your part of the code is decent. You can improve it by using the fact that on one of the spikes there are the squares, so that that needn't be considered at all.

Another point is that instead of computing all primes, you could just inspect the values on the diagonals (ignoring the squares) and check which of them are prime if you have a fast prime check. Whether that's faster depends on the prime check.

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
  • It seems strange to me that there's such a high quality requirement on the primes generation. Previous problems were solvable with my hand made stupid algorithm so it's kind of a big jump. Thanks though I didn't hear about your lib and it seems awesome. Though, right now it makes my system swap and freeze but if I edit my algo into a "more fold-like" type of thing it should be fine I guess :) – m09 Oct 20 '12 at 14:47
  • What makes your system swap and freeze? If you're referencing `primes` from different places, the list will be kept, since it's a CAF. For such cases, use `sieveFrom` to avoid sharing a huge list, or share the more compact list of `PrimeSieve`s and feed that to `primeList` where needed. – Daniel Fischer Oct 20 '12 at 14:52
  • To be more precise, it swaps for a limit set to `10` instead of `8` (in the guard of the go function). edit post-you-comment: I'll look into it, thanks :) – m09 Oct 20 '12 at 14:52
  • Note that for the problem, you need 9, not 10. Edit: That said, even with 10, I get `Total time 33.66s ( 33.70s elapsed)` with `307,560 bytes maximum residency`. – Daniel Fischer Oct 20 '12 at 14:53
  • Hum that's embarassing. I was calculating a primes/composites ratio while a primes/total ratio is needed. I guess that actually was the main problem. (well, not the main, but 10 is really out of reach with my former program while 9 isn't). – m09 Oct 20 '12 at 14:58
  • Well, that contributes, but with 9, `Data.Numbers.Primes.primes` takes three minutes here vs. 5 seconds for my sieve. – Daniel Fischer Oct 20 '12 at 15:01
  • yup your lib is a great improvement, thanks a lot for the help, it's appreciated :) – m09 Oct 20 '12 at 15:03