5

I have written a Sieve of Eratosthenes--I think--but it seems like it's not as optimized as it could be. It works, and it gets all the primes up to N, but not as quickly as I'd hoped. I'm still learning Python--coming from two years of Java--so if something isn't particularly Pythonic then I apologize:

def sieve(self):
        is_prime = [False, False, True, True] + [False, True] * ((self.lim - 4) // 2)
        for i in range(3, self.lim, 2):
            if i**2 > self.lim: break
            if is_prime[i]:
                for j in range(i * i, self.lim, i * 2):
                    is_prime[j] = False
        return is_prime

I've looked at other questions similar to this one but I can't figure out how some of the more complicated optimizations would fit in to my code. Any suggestions?

EDIT: as requested, some of the other optimizations I've seen are stopping the iteration of the first for loop before the limit, and skipping by different numbers--which I think is wheel optimization?

EDIT 2: Here's the code that would utilize the method, for Padraic:

primes = sieve.sieve()
for i in range(0, len(primes)):
    if primes[i]:
        print("{:d} ".format(i), end = '')
print() # print a newline
  • Could you give a brief snippet explaining those other optimizations you are referencing? – BlackVegetable Jun 29 '15 at 16:40
  • The second `if i**2 > self.lim: break` line is superfluous. – jwodder Jun 29 '15 at 16:40
  • @BlackVegetable Question edited. –  Jun 29 '15 at 16:55
  • Does this actually work or are you missing a filter step? – Padraic Cunningham Jun 29 '15 at 17:02
  • @PadraicCunningham it does work, I've tested several cases. –  Jun 29 '15 at 17:06
  • How are you using it? You end up with a list of booleans not numbers – Padraic Cunningham Jun 29 '15 at 17:08
  • @PadraicCunningham well, the indexes are either true or false. 'is_prime[2]' will equal 'True'. Therefore, 2 is prime. The 0 index is 'False', so zero is not prime. You could convert it into an integer list if you wanted. It's for the sake of memory. –  Jun 29 '15 at 17:16
  • 3
    Slice assignment is a standard technique you can use to speed up the inner loop of setting multiples to false. I was going to type up an answer, but a simple google search yields results better than I could have --- for example, http://codereview.stackexchange.com/questions/42420/sieve-of-eratosthenes-python – sykora Jun 29 '15 at 17:32
  • 2
    you can use `range(i * i, self.lim, 2*i)`. – Will Ness Jun 29 '15 at 17:56

1 Answers1

2

a slightly different approach: use a bitarray to represent the odd numbers 3,5,7,... saving some space compared to a list of booleans.

this may save some space only and not help speedup...

from bitarray import bitarray

def index_to_number(i): return 2*i+3
def number_to_index(n): return (n-3)//2

LIMIT_NUMBER = 50
LIMIT_INDEX = number_to_index(LIMIT_NUMBER)+1

odd_primes = bitarray(LIMIT_INDEX)
# index  0 1 2 3
# number 3 5 7 9

odd_primes.setall(True)

for i in range(LIMIT_INDEX):
    if odd_primes[i] is False:
        continue
    n = index_to_number(i)
    for m in range(n**2, LIMIT_NUMBER, 2*n):
        odd_primes[number_to_index(m)] = False

primes = [index_to_number(i) for i in range(LIMIT_INDEX)
          if odd_primes[i] is True]
primes.insert(0,2)

print('primes: ', primes)

the same idea again; but this time let bitarray handle the inner loop using slice assignment. this may be faster.

for i in range(LIMIT_INDEX):
    if odd_primes[i] is False:
        continue
    odd_primes[2*i**2 + 6*i + 3:LIMIT_INDEX:2*i+3] = False

(none of this code has been seriously checked! use with care)


in case you are looking for a primes generator based on a different method (wheel factorizaition) have a look at this excellent answer.

Community
  • 1
  • 1
hiro protagonist
  • 44,693
  • 14
  • 86
  • 111
  • That's actually a really interesting approach. Saves space for sure. –  Jun 29 '15 at 18:17
  • I went ahead and tried to integrate your solution and the other ones, and came up with a combination that seems to work well. My original algorithm took ~800ms for 1,000,000. My new one depends on how you do it. If you use a bitarray (not optimized to only do odds) then it takes ~200ms. Okay speed up, but the real benefit is in the fact that its memory footprint is basically nonexistent. (using sys.getsizeof()). If you use a boolean array instead, then it takes ~60-80ms. –  Jun 30 '15 at 01:02
  • personally I prefer n=2i+1 addressing scheme (at the cost of wasting one memory cell) as it allows for nice closed form of indexing: for the i-th entry, its value is (2i+1), its square is 4i^2+4i+1, the square's index is (4i^2+4i+1-1)/2 = 2i^2+2i = 2i(i+1); and from here we increment by 2(2i+1) in value, which is (2i+1) increment in index. cf. a [c++ code](http://ideone.com/fapob). – Will Ness May 14 '17 at 11:45
  • @WillNess that looks like a great improvement! will have a closer look at it. thanks a lot! you really seem to be one of *the* experts here for that kind of stuff! – hiro protagonist May 14 '17 at 14:56
  • @hiroprotagonist thanks. I'm not really, I just used to answer a lot of simpler questions about it once. I never went for the complicated stuff. – Will Ness May 14 '17 at 16:40
  • @hiroprotagonist for the "expert" stuff see e.g. https://codegolf.stackexchange.com/questions/74269/calculate-the-number-of-primes-up-to-n/ :) – Will Ness May 14 '17 at 17:56