Getting functional sieve of Eratosthenes fast

Question

I read this other post about a F# version of this algorithm. I found it very elegant and tried to combine some ideas of the answers.

Although I optimized it to make fewer checks (check only numbers around 6) and leave out unnecessary caching, it is still painfully slow. Calculating the 10,000^th prime already take more than 5 minutes. Using the imperative approach, I can test all 31-bit integers in not that much more time.

So my question is if I am missing something that makes all this so slow. For example in another post someone was speculating that LazyList may use locking. Does anyone have an idea?

As StackOverflow's rules say not to post new questions as answers, I feel I have to start a new topic for this.

Here's the code:

#r "FSharp.PowerPack.dll"

open Microsoft.FSharp.Collections

let squareLimit = System.Int32.MaxValue |> float32 |> sqrt |> int

let around6 = LazyList.unfold (fun (candidate, (plus, next)) -> 
        if candidate > System.Int32.MaxValue - plus then
            None
        else
            Some(candidate, (candidate + plus, (next, plus)))
    ) (5, (2, 4))

let (|SeqCons|SeqNil|) s =
    if Seq.isEmpty s then SeqNil
    else SeqCons(Seq.head s, Seq.skip 1 s)

let rec lazyDifference l1 l2 =
    if Seq.isEmpty l2 then l1 else
    match l1, l2 with
    | LazyList.Cons(x, xs), SeqCons(y, ys) ->
        if x < y then
            LazyList.consDelayed x (fun () -> lazyDifference xs l2)
        elif x = y then
            lazyDifference xs ys
        else
            lazyDifference l1 ys
    | _ -> LazyList.empty

let lazyPrimes =
    let rec loop = function
        | LazyList.Cons(p, xs) as ll ->
            if p > squareLimit then
                ll
            else
                let increment = p <<< 1
                let square = p * p
                let remaining = lazyDifference xs {square..increment..System.Int32.MaxValue}
                LazyList.consDelayed p (fun () -> loop remaining)
        | _ -> LazyList.empty
    loop (LazyList.cons 2 (LazyList.cons 3 around6))

The slowness is in your `(|SeqCons|SeqNil|)` active pattern, which takes about O(n^2) time. I don't think there's a way to pattern match on sequences, so you're probably better converting it to a LazyList instead. See brian's awesome answer here: http://stackoverflow.com/questions/1306140/f-why-is-using-a-sequence-so-much-slower-than-using-a-list-in-this-example/1306267#1306267 — Juliet, Jun 24 '11 at 15:39
You might find this of interest. http://stackoverflow.com/questions/4629734/the-sieve-of-eratosthenes-in-f/4631738#4631738 — gradbot, Jun 24 '11 at 15:48
Objectively, this is an unsolved problem. There is no known way to implement a competitively-efficient pure Sieve of Eratosthenes. You can optimize it a bit but you will never approach the performance of an imperative solution, so it is an academic exercise. If you want to write fast code to solve these kinds of problems, you must embrace impurity. Moreover, I believe the performance gap between pure and impure will never be closed because purity abstracts away precisely the information you need in order to write fast code. — J D, Jun 28 '11 at 09:38
@JonHarrop It is true that [working with lists](http://www.haskell.org/haskellwiki/Prime_numbers#Tree_merging) the difference is there, and is akin to the difference between integer sorting and comparison sorting; but complexity of pure code can be brought down significantly close to the optimal one (see my answer in this thread). But [working with immutable arrays](http://www.haskell.org/haskellwiki/Prime_numbers#Accumulating_Array) nothing precludes a smart implementation from using destructive update and thus achieving true optimal complexity of imperative implementations (in theory). — Will Ness, Jan 01 '12 at 01:55
@WillNess: I agree that a sufficiently smart compiler could magically optimize the pure source code to a competitively-efficient impure implementation in theory but we are nowhere near having compilers with that level of sophistication today and I do not believe it will ever happen. This is just the "sufficiently smart compiler" myth from Lisp revisited. In practice, building compilers capable of more optimizations degrades the predictability of the performance of the resulting code to the point that it is practically useless (just look at the Stalin Scheme compiler). — J D, Jan 01 '12 at 03:48
@JonHarrop yes, of course, in general. But I think it should be guaranteed in any functional language that when a variable is used as a state-passing accumulating argument to a tail-recursive function it will get updated destructively, and we get to define what it means, for each type(class) (uniqueness rules, monads suck!). That is the whole premise of the equational reasoning (for me), nothing magical in that. But even w/out that, the 1st code I linked to is _very_ _close_ in complexity to imperative implmns, differing only in some `log` factor (+ a huge constant factor of course). — Will Ness, Jan 01 '12 at 08:06
"nothing magical in that". Does it require whole-program analysis? "a huge constant factor of course". Sure. This is really all about that constant factor. Quicksort in Haskell is asymptotically efficient in theory but thousands of times slower in practice because these "constant" factors can be enormous. And are they really constant? What do the cache complexities look like? Probably impossible to say for non-array implementations without knowledge of the GC's internals... — J D, Jan 01 '12 at 08:50
"whole-program" I don't think so. Here's that code (2nd linked): `sieve p a = sieve (p+2) (a//[(i,False) | i <- [p*p, p*p+2*p..m]])` with `//` an array-update operator. The old value goes out of scope, the new one comes in. So no, `a` is clearly, locally, an accumulator argument (of type `array`), so can safely be updated destructively. And with it, the constant factor could be low too. The main point to SoE efficiency is ability to conflate address and value, as in integer sorting, but even without that the tree-merging code gets pretty close in *complexity*. — Will Ness, Jan 01 '12 at 09:27
so I think it's more a question of quality of implementation. In the end it will all be (must be!) just "English" compiler anyway... :) BTW why should the compiler end its work when executable file is produced? Why not continue working in the background with optimizations, tests, automagical re-writings, to produce better executable? Looking back 50 years from now, our state-of-the-art will be seen as beyond laughable for sure. Compilers should really be expert rewriting systems I think, profiling the code themselves, etc. — Will Ness, Jan 01 '12 at 09:32
but of course as it currently is, in reality, no functional code would beat C (and derivatives). :) — Will Ness, Jan 01 '12 at 09:40

score 2 · Accepted Answer · answered Jun 24 '11 at 15:48

2

If you are calling Seq.skip anywhere, then there's about a 99% chance that you have an O(N^2) algorithm. For nearly every elegant functional lazy Project Euler solution involving sequences, you want to use LazyList, not Seq. (See Juliet's comment link for more discussion.)

answered Jun 24 '11 at 15:48

Brian

117,631
17
236
300

Interesting, that would surely explain the slowness. But he thing is that I previously used `LazyList` instead of a `Seq` there, too. But as `LazyList` uses caching, my tests always ran out of memory (about 1.5 TB). That's why I switched to `Seq`. Can I disable caching for `LazyList`? – primfaktor Jun 27 '11 at 05:42
No, but you can discard the front of a LazyList if you don't need it any more. Just make sure you're not saving any references to the front of the list, and you can "tail" into it and let the front get GC'd as you consume it. – Brian Jun 28 '11 at 03:39
Hi Brian, I thought `LazyList` is caching inherently, is that wrong? So after even if *I* keep no reference to the value of the head at some point, the *list would have* to because of the caching. Thus the 1.5 TB. Am I getting something wrong? – primfaktor Jan 17 '12 at 09:27
Yes. Its a singly linked list, there's nothing to cache after you tail in and discard the head. – Brian Jan 17 '12 at 17:07

Will Ness · Answer 2 · 2019-08-02T18:31:25.330

Even if you succeed in taming the strange quadratic F# sequences design issues, there is certain algorithmic improvements still ahead. You are working in (...((x-a)-b)-...) manner here. x, or around6, is getting deeper and deeper, but it's the most frequently-producing sequence. Transform it into (x-(a+b+...)) scheme -- or even use a tree structure there -- to gain an improvement in time complexity (sorry, that page is in Haskell). This gets actually very close to the complexity of imperative sieve, although still mush slower than the baseline C++ code.

Measuring local empirical orders of growth as O(n^a) <--> a = log(t_2/t_1) / log(n_2/n_1) (in n primes produced), the ideal n log(n) log(log(n)) translates into O(n^1.12) .. O(n^1.085) behaviour on n=10^5..10^7 range. A simple C++ baseline imperative code achieves O(n^1.45 .. 1.18 .. 1.14) while tree-merging code, as well as priority-queue based code, both exhibit steady O(n^1.20) behaviour, more or less. Of course C++ is ~5020..15 times faster, but that's mostly just a "constant factor". :)

Yep, damn these constant factors. ;-) Unfortunately, although I really like functional, I have a hard time reading Haskell. — primfaktor, Jan 17 '12 at 09:24
@primfaktor take a look [here](http://stackoverflow.com/a/8871918/849891), see if it helps in reading Haskell. If you know Prolog, see also my other [recent answer](http://stackoverflow.com/a/8882745/849891). — Will Ness, Jan 17 '12 at 09:28
@primfaktor C++ is actually 20..15 times faster, I've just corrected it. — Will Ness, Jan 17 '12 at 13:13

Getting functional sieve of Eratosthenes fast

2 Answers2

Linked