Is my Sieve of Eratosthenes implemented correctly? (Python)

Question

I need to generate a large number of prime numbers, however it is taking far too long using the Sieve of Eratosthenes. Currently it takes roughly 3 seconds to generate primes under 100,000 and roughly 30 for primes under 1,000,000. Which seems to indicate an O(n) complexity but as far as I know that's not right. Code:

def generate_primes(limit):
    boolean_list = [False] * 2 + [True] * (limit - 1)
    for n in range(2, int(limit ** 0.5 + 1)):
        if boolean_list[n] == True:
            for i in range(n ** 2, limit + 1, n):
                boolean_list[i] = False

Am I missing something obvious? How can I improve the performance of the sieve?

If your algorithm is O(n) congratulations, it's faster than any known implementation (it's usually O(n log log n) according to the Wikipedia page). But it's not clear why that's a problem. — Denziloe, Aug 30 '18 at 20:09
`if boolean_list[n] == True:` => `if boolean_list[n]:` apart from that I don't see how it could be improved — Jean-François Fabre, Aug 30 '18 at 20:09
maybe use `numpy` to set to false with vectorized indices to avoid the inner loop? — Jean-François Fabre, Aug 30 '18 at 20:11
@Denziloe Maybe I've misunderstood the time complexity for it. I was under the assumption that the Sieve of Eratosthenes was meant to run faster and I wasn't sure if there was a mistake with my implementation of it — danjedwards123, Aug 30 '18 at 20:12
@Denziloe Just in general. Are there any quicker ways to generate large primes? — danjedwards123, Aug 30 '18 at 20:16
@danjedwards123 Yes, I'm sure there are faster ways than Eratosthenes. Is your implementation of Eratosthenes worryingly slow? I don't see any reason to think so. Python is a high-level language and iterating in Python can often be around 100 times slower than using a lower-level language such as C. If you want to use Python you might want to look into `numpy` which provides an interface to fast arrays written in C. — Denziloe, Aug 30 '18 at 20:21
@Serge They are the same in this case as ** is the power operator — danjedwards123, Aug 30 '18 at 20:22
@danjedwards123 I ran your code basically unchanged on a Raspberry pi 1 model B, no overclocking and got 0.6 seconds and 6.65 seconds respectively for limits 10^5 and 10^6. Are you perhaps trying it out in debugging mode in an IDE? that could add delays on each iteration and this code has lots of them — Ilia Gilmijarow, Aug 30 '18 at 20:39
Please see: [Why is “Is this correct?” an off topic question, and what should I ask instead?](https://meta.stackoverflow.com/questions/359466/why-is-is-this-correct-an-off-topic-question-and-what-should-i-ask-instead) — EJoshuaS - Stand with Ukraine, Aug 30 '18 at 23:28

Daniel R. Livingston · Accepted Answer · 2018-08-30T20:36:35.107

Loop indexing is well known in Python to be an incredibly slow operation. By replacing a loop with array slicing, and a list with a Numpy array, we see increases @ 3x:

import numpy as np
import timeit

def generate_primes_original(limit):
    boolean_list = [False] * 2 + [True] * (limit - 1)
    for n in range(2, int(limit ** 0.5 + 1)):
        if boolean_list[n] == True:
            for i in range(n ** 2, limit + 1, n):
                boolean_list[i] = False
    return np.array(boolean_list,dtype=np.bool)

def generate_primes_fast(limit):

    boolean_list = np.array([False] * 2 + [True] * (limit - 1),dtype=bool)
    for n in range(2, int(limit ** 0.5 + 1)):
        if boolean_list[n]:
            boolean_list[n*n:limit+1:n] = False
    return boolean_list

limit = 1000

print(timeit.timeit("generate_primes_fast(%d)"%limit, setup="from __main__ import generate_primes_fast"))
# 30.90620080102235 seconds

print(timeit.timeit("generate_primes_original(%d)"%limit, setup="from __main__ import generate_primes_original"))
# 91.12803511600941 seconds

assert np.array_equal(generate_primes_fast(limit),generate_primes_original(limit))
# [nothing to stdout - they are equal]

To gain even more speed, one option is to use numpy vectorization. Looking at the outer loop, it's not immediately obvious how one could vectorize that.

Second, you will see dramatic speed-ups if you port to Cython, which should be a fairly seamless process.

Edit: you may also see improvements by changing things like n**2 => math.pow(n,2), but minor improvements like that are inconsequential compared to the bigger problem, which is the iterator.

`math.pow` is quite a bit slower than using the exponent operator, actually. https://stackoverflow.com/questions/20969773/exponentials-in-python-x-y-vs-math-powx-y — mypetlion, Aug 30 '18 at 20:46
Thanks so much for this. I wasn't expecting such an improvement, and I'll look into the other two recommendations you gave. — danjedwards123, Aug 30 '18 at 20:55
@mypetlion - that's interesting, I didn't know that. It always surprises me how high the overhead for function calls is. Thanks! — Daniel R. Livingston, Aug 30 '18 at 22:45
@danjedwards123 - good luck! If speed is critically important and you don't want to move to a compiled language, try checking out Julia - it's an insanely fast JIT language that is perfectly suited for this usecase :) — Daniel R. Livingston, Aug 30 '18 at 22:46

score 0 · Answer 2 · answered Aug 30 '18 at 20:47

0

If your are still using Python 2 use xrange instead of range for greater speed

answered Aug 30 '18 at 20:47

Serge

3,387
3
16
34

Is my Sieve of Eratosthenes implemented correctly? (Python)

2 Answers2