4

I have answered Project Euler Question 7 very easily using Sieve of Eratosthenes in C and I had no problem with it.

I am still quite new to F# so I tried implementing the same technique

let prime_at pos =
  let rec loop f l =
    match f with 
    | x::xs -> loop xs (l |> List.filter(fun i -> i % x <> 0 || i = x))
    | _ -> l

  List.nth (loop [2..pos] [2..pos*pos]) (pos-1)

which works well when pos < 1000, but will crash at 10000 with out of memory exception

I then tried changing the algorithm to

let isPrime n = n > 1 && seq { for f in [2..n/2] do yield f } |> Seq.forall(fun i -> n % i <> 0)

seq {for i in 2..(10000 * 10000) do if isPrime i then yield i} |> Seq.nth 10000 |> Dump

which runs successfully but still takes a few minutes.

If I understand correctly the first algorithm is tail optimized so why does it crash? And how can I write an algorithm that runs under 1 minute (I have a fast computer)?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
happygilmore
  • 3,008
  • 4
  • 23
  • 37

2 Answers2

5

Looking at your first attempt

let prime_at pos =
  let rec loop f l =
    match f with 
    | x::xs -> loop xs (l |> List.filter(fun i -> i % x <> 0 || i = x))
    | _ -> l

  List.nth (loop [2..pos] [2..pos*pos]) (pos-1)

At each loop iteration, you are iterating over and creating a new list. This is very slow as list creation is very slow and you don't see any benefits from the cache. Several obvious optimisations such as the factor list skipping the even numbers, are skipped. When pos=10 000 you are trying to create a list which will occupy 10 000 * 10 000 * 4 = 400MB of just integers and a further 800MB of pointers (F# lists are linked lists). Futhermore, as each list element takes up a very small amount of memory there will probably be significant overhead for things like GC overhead. In the function you then create a new list of smiliar size. As a result, I am not surprised that this causes OutOfMemoryException.

Looking at the second example,

let isPrime n = 
    n > 1 && 
    seq { for f in [2..n/2] do yield f } 
    |> Seq.forall(fun i -> n % i <> 0)

Here, the problem is pretty similar as you are generating giant lists for each element you are testing.

I have written a quite fast F# sieve here https://stackoverflow.com/a/12014908/124259 which shows how to do this faster.

Community
  • 1
  • 1
John Palmer
  • 25,356
  • 3
  • 48
  • 67
3

As already mentioned by John, your implementation is slow because it generates some temporary data structures.

  • In the first case, you are building a list, which needs to be fully created in memory and that introduces significant overhead.

  • In the second case, you are building a lazy sequence, which does not consume memory (because it is build while it is being iterated), but it still introduces indirection that slows the algorithm down.

In most cases in F#, people tend to prefer readability and so using sequences is a nice way to write the code, but here you probably care more about performance, so I'd avoid sequences. If you want to keep the same structure of your code, you can rewrite isPrime like this:

let isPrime n = 
  let rec nonDivisible by =
    if by = 1 then true        // Return 'true' if we reached the end
    elif n%by = 0 then false   // Return 'false' if there is a divisor
    else nonDivisible (by - 1) // Otherwise continue looping

  n > 1 && nonDivisible (n/2)

This just replaces the sequence and forall with a recursive function nonDivisible that returns true when the number n is not divisible by any number between 2 and n/2. The function first checks the two termination cases and otherwise performs a recursive call..

With the original implementation, I'm able to find 1000th prime in 1.5sec and with the new one, it takes 22ms. Finding 10000th prime with the new implementation takes 3.2sec on my machine.

Tomas Petricek
  • 240,744
  • 19
  • 378
  • 553
  • I'm really impressed that this works! I understand the algorithm but could you please explain more why this is significantly faster than sequences? I read somewhere that F# engine poorly implements IEnumerable compared to C# (which uses state machines). Could this be why? – happygilmore May 20 '14 at 13:52
  • @happygilmore : Current versions of F# also produce state machines for sequence expressions; whatever you read must have been quite outdated. – ildjarn May 21 '14 at 04:32
  • @happygilmore do you mean the last paragraph in GordonBGoods answer ?http://stackoverflow.com/questions/12014224/when-generating-primes-in-f-why-is-the-sieve-of-erosthenes-so-slow-in-this-p/18035652#18035652 – Goswin Rothenthal May 21 '14 at 11:00
  • @happygilmore: since you're doing Project Euler, another optimization to the isPrime function is that you only need to check numbers up to the square root of your target. If a number doesn't have a divisor below or equal its square root, the number is prime. That's significantly lower than n/2. – Cesar Mendoza May 22 '14 at 19:22