2

I am repeatedly getting a stack overflow on my solution to Project Euler #7 and i have no idea why. Here is my code:

import System.Environment

checkPrime :: Int -> Bool
checkPrime n = not $ testList n [2..n `div` 2]

--testList :: Int -> [Int] -> Bool
testList _ [] = False
testList n xs 
    | (n `rem` (head xs) == 0) = True
    | otherwise  = testList n (tail xs)

primesTill n = sum [1 | x <- [2..n], checkPrime x]
nthPrime n = nthPrime' n 2
nthPrime' n x
    | (primesTill x == n) = x
    | otherwise = nthPrime' n x+1

main = print (nthPrime 10001)
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Asad-ullah Khan
  • 1,573
  • 18
  • 22
  • Have you tried loading this file into GHCi and testing each function individually to find where the stackoverflow is occurring? That'd be where I'd start. Maybe it's in `testList`, or possibly in `nthPrime'`. – bheklilr Sep 09 '14 at 03:07
  • FYI: there are much, much faster ways to find prime numbers. In particular, going all the way up to `n - 1` for `n` is horribly inefficient, an easy change that would give you a decent speed increase would be to just go up to ```n `div` 2```, and your `testList` function working in reverse on a list is not going to work well, try working forwards on the list, or reversing the list if you still want to go largest to smallest. – bheklilr Sep 09 '14 at 03:07
  • @bheklilr that is true but it still complains about stack overflow – Asad-ullah Khan Sep 09 '14 at 03:13
  • which bit complains about it? What function in particular? If it's all of them, then which function is at the bottom of the call tree? Which function doesn't call any other function? – bheklilr Sep 09 '14 at 03:15
  • @bheklilr the function that is complaining is the last method (nthPrime'). – Asad-ullah Khan Sep 09 '14 at 03:16
  • 8
    it might be because you missed some parentheses around `x+1`. Function application is higher precedence than operator application. – bheklilr Sep 09 '14 at 03:19
  • We can check for factors until sqrt(n) other than n/2. http://stackoverflow.com/q/5811151/2610955 @bheklilr – xtreak Sep 09 '14 at 05:26
  • @Wordzilla it's a good point that I'm aware of, and even this is a slow method, but I didn't want to introduce a significant amount of complexity from having to convert between numeric types to get that square root in there. – bheklilr Sep 09 '14 at 11:25

1 Answers1

2

resolving the stackoverflow

As @bheklilr mentioned in his comment the stackoverflow is caused by a wrong evaluation order in the otherwise branch of the nthPrime' function:

nthPrime' n x+1

Will be interpreted as

(nthPrime' n x)+1

Because this expression is called recursively, your call of nthPrime' n 2 will expand into

(nthPrime' n 2)+1+1+1+1+1+1+1+1 ...

but the second parameter will never get incremented and your program collects a mass of unevaluated thunks. The evaluation can only happen if the first parameter is reduced to an Int, but your function is in an endless recursion so this will never take place. All the plus ones are stored on the stack, if there is no more space left you'll get a stackoverflow error.

To solve this problem you need to put parranteses around the x+1 so your recursive call will look like this

 nthPrime' n (x+1)

Now the parameters gets incremented before it is passed to the recursive call.

This should solve your stackoverflow problem, you can try it out with a smaller number e.g. 101 and you'll get the desired result.

runtime optimization

If you test your program with the original value 10001 you may realize that it still won't finish in a reasonable amount of time.

I won't go into the details of fancy algorithms to solve this problems, if you're interested in them you can easily find them online. Instead I'll show you were the problem in your code is and show you a simple solution.

The bottleneck is your nthPrime function:

 primesTill n = sum [1 | x <- [2..n], checkPrime x]
 nthPrime n = nthPrime' n 2
 nthPrime' n x
     | (primesTill x == n) = x
     | otherwise = nthPrime' n (x+1)

This function checks if the number of primes between 2 and x is equal to n. The idea is correct, but it leads to an exponential runtime. The problem is that you recalculate primesTill x for every iteration. To count the primes smaller than x you calculate them all and than sum them up. In the next step for x+1 you forget every thing you know about the numbers between 2 and x and test them all again if they are prime only as a last step you test the if x+1 is prime. Than you repeat this - forget every thing and test all numbers again - until you are finished.

Wouldn't it be great if the computer could remember the primes it has already found?

There are many possibilities to do this I'll use a simple infinite list, if you are interested in other approaches you can search for the terms memoization or dynamic programming.

We start with the list comprehension you used in primesTill:

 [1 | x <- [2..n], checkPrime x]

This calculates all primes between 2 and n, but immediately forgets the prime number and replaces it with 1, so the first step will be to keep the actual numbers.

 [x | x <- [2..n], checkPrime x]

This gives us a list of all prime numbers between 2 and n. If we had a sufficiently large list of prime numbers we could use the index function !! to get the 10001st prime number. So we need to set n to a really really big number, to be sure that the filtered list is long enough?

Lazy evaluation to the rescue!

Lazy evaluation in haskell allows us to build an infinite list, that is only evaluated as much as needed. If we don't supply an upper bound to a list generator it will build such an infinite list for us.

 [x | x <- [2..], checkPrime x]

Now we have a infinite list of all prime numbers. We can bind it to the a name e.g. primes and use it to define nthPrime

primes = [x | x <- [2..], checkPrime x]
nthPrime n = primes !! n

Now you can compile it with ghc -O2, run it and the result will be promptly delivered to you.

bmaderbacher
  • 1,261
  • 1
  • 10
  • 19