Loop indexing is well known in Python to be an incredibly slow operation. By replacing a loop with array slicing, and a list with a Numpy array, we see increases @ 3x:
import numpy as np
import timeit
def generate_primes_original(limit):
boolean_list = [False] * 2 + [True] * (limit - 1)
for n in range(2, int(limit ** 0.5 + 1)):
if boolean_list[n] == True:
for i in range(n ** 2, limit + 1, n):
boolean_list[i] = False
return np.array(boolean_list,dtype=np.bool)
def generate_primes_fast(limit):
boolean_list = np.array([False] * 2 + [True] * (limit - 1),dtype=bool)
for n in range(2, int(limit ** 0.5 + 1)):
if boolean_list[n]:
boolean_list[n*n:limit+1:n] = False
return boolean_list
limit = 1000
print(timeit.timeit("generate_primes_fast(%d)"%limit, setup="from __main__ import generate_primes_fast"))
# 30.90620080102235 seconds
print(timeit.timeit("generate_primes_original(%d)"%limit, setup="from __main__ import generate_primes_original"))
# 91.12803511600941 seconds
assert np.array_equal(generate_primes_fast(limit),generate_primes_original(limit))
# [nothing to stdout - they are equal]
To gain even more speed, one option is to use numpy vectorization. Looking at the outer loop, it's not immediately obvious how one could vectorize that.
Second, you will see dramatic speed-ups if you port to Cython, which should be a fairly seamless process.
Edit: you may also see improvements by changing things like n**2 => math.pow(n,2)
, but minor improvements like that are inconsequential compared to the bigger problem, which is the iterator.