Haskell Vector performance woes

Question

I wrote the classic prime number sieve algorithm in C which pretty much instantly generates all the primes smaller than 1,000,000 on my machine (I didn't do anything to optimize it).

In Haskell I use a Vector of Ints, and generate an update list for each prime number which sets the multiples to zero. This is basically the same thing as I do in C with a nested for loop:

import qualified Data.Vector.Unboxed as U

--| generate primes smaller than p
sievePrimes :: Int -> U.Vector Int
sievePrimes p = foldl f ns [0..p-1]
    where
        ns = U.fromList $ 0:[2..p]
        f v i
            | v U.! i == 0 = v
            | otherwise = v U.// [((i + 1)*k - 1, 0) | k <- [2..p `div` (i + 1)]]

The Haskell version runs about 1000 times slower than the C version. All I've done to optimize is to use the vector package. The algorithm is the same.

Why is it running so much slower? Isn't Vector efficient enough?

I compiled both C and Haskell with -O2.

[the genuine sieve of erastothenes](https://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf) — rampion, Apr 08 '16 at 01:24
There are so, so many questions on SO about prime number generation in Haskell. Did you search before you asked? If so, please take some time to outline what isn't covered well by the other questions. If not... no time like the present! — Daniel Wagner, Apr 08 '16 at 03:28
no offense but the linked question really seems to be basically the same and has an outstanding answer - together with the linked article from rampion you should be good to go - if not please appologize — Random Dev, Apr 08 '16 at 04:03
cf. https://wiki.haskell.org/Prime_numbers#Accumulating_Array. More efficient is to use mutable array, like in https://wiki.haskell.org/Prime_numbers#Using_ST_Array. There's a version of it by Daniel Fischer somewhere on CodeReview, using loops instead of `forM`, and `unsafeWrite`, which runs substantially faster. Otherwise, consult the `arithmoi` package. — Will Ness, Apr 08 '16 at 07:56
found it for you: [Daniel's answer](http://stackoverflow.com/a/15026238); [my Ideone entry](http://ideone.com/j24jxV) w/ odds-only array version of it, consistently 1.6x slower than a [comparable C++ version](http://ideone.com/fapob). and [an *immutable array* version](http://ideone.com/rhj9ub) which is about 9.6x slower than the C++ version. overall, we have STUArray(unsafeWrite,loops) | STUArray(writeArray,forM) | UArray(segmented) | UArray(contiguous accumulating array) speed ratios vs. the C++ as `1.6x | 4.8x | 9.6x | 11x-13x`. — Will Ness, Apr 08 '16 at 11:23
to clarify: the last one is the [odds-only version](http://ideone.com/RzqLP). [the one on the wiki](https://wiki.haskell.org/Prime_numbers#Accumulating_Array) is [32x slower](http://ideone.com/dE0iU) than C++. — Will Ness, Apr 08 '16 at 13:16
@WillNess, the `arithmoi` sieves are not to be trusted. They're poorly documented and extremely complicated, and until recently one of them had a tendency to segfault. — dfeuer, Apr 08 '16 at 15:58

Will Ness · Answer 1 · 2016-04-17T23:48:51.557

You need to replace your foldl call with foldl f ns [0 .. floor . sqrt . fromIntegral $ p]. This radically improves the time complexity, as explained in the haskell-wiki, because the cost of U.// is O(m+n) instead of O(n) as it is in languages with destructive update, like C; which is the reason why not stopping as soon as possible doesn't worsen the time complexity there.

But in Haskell it is crucial to stop as soon as possible with immutable data, where the time complexity of the update operation is worse, as it is for lists and Vectors too.

When analyzing the performance of a piece of code, always analyze its empirical orders of growth by measuring its speed at several problem size points and calculating logBase (n2/n1) (t2/t1).

The updated code, interpreted in GHCi on my computer, generates the n = 78,498 primes below N = 1,000,000 in 3.53s, running at ~N^1.1 or ~n^1.2, which is very decent.

Your code, measured at N = 40,000/20,000, runs at ~N^1.7 or ~n^1.9, which is terribly slow; the projected run time to 1,000,000 is 7 minutes, if the complexity stays the same.

Haskell Vector performance woes

1 Answers1