Sum of primes below 2,000,000 in python

Question

I am attempting problem 10 of Project Euler, which is the summation of all primes below 2,000,000. I have tried implementing the Sieve of Erasthotenes using Python, and the code I wrote works perfectly for numbers below 10,000.

However, when I attempt to find the summation of primes for bigger numbers, the code takes too long to run (finding the sum of primes up to 100,000 took 315 seconds). The algorithm clearly needs optimization.

Yes, I have looked at other posts on this website, like Fastest way to list all primes below N, but the solutions there had very little explanation as to how the code worked (I am still a beginner programmer) so I was not able to actually learn from them.

Can someone please help me optimize my code, and clearly explain how it works along the way?

Here is my code:

primes_below_number = 2000000 # number to find summation of all primes below number
numbers = (range(1, primes_below_number + 1, 2)) # creates a list excluding even numbers
pos = 0 # index position
sum_of_primes = 0 # total sum
number = numbers[pos]
while number < primes_below_number and pos < len(numbers) - 1:
    pos += 1
    number = numbers[pos] # moves to next prime in list numbers
    sum_of_primes += number # adds prime to total sum
    num = number
    while num < primes_below_number:
        num += number
        if num in numbers[:]:
            numbers.remove(num) # removes multiples of prime found

print sum_of_primes + 2

As I said before, I am new to programming, therefore a thorough explanation of any complicated concepts would be deeply appreciated. Thank you.

Any reason you are creating a copy of `numbers[:]` when testing num? — AChampion, Sep 12 '15 at 18:52
It _is_ unsafe to remove items from a list that you're iterating over, but your code isn't doing that, so as achampion indicates, you don't need to duplicate `numbers` here. — PM 2Ring, Sep 12 '15 at 19:02

PM 2Ring · Accepted Answer · 2015-09-12T18:58:15.880

As you've seen, there are various ways to implement the Sieve of Erasthotenes in Python that are more efficient than your code. I don't want to confuse you with fancy code, but I can show how to speed up your code a fair bit.

Firstly, searching a list isn't fast, and removing elements from a list is even slower. However, Python provides a set type which is quite efficient at performing both of those operations (although it does chew up a bit more RAM than a simple list). Happily, it's easy to modify your code to use a set instead of a list.

Another optimization is that we don't have to check for prime factors all the way up to primes_below_number, which I've renamed to hi in the code below. It's sufficient to just go to the square root of hi, since if a number is composite it must have a factor less than or equal to its square root.

We don't need to keep a running total of the sum of the primes. It's better to do that at the end using Python's built-in sum() function, which operates at C speed, so it's much faster than doing the additions one by one at Python speed.

# number to find summation of all primes below number
hi = 2000000

# create a set excluding even numbers
numbers = set(xrange(3, hi + 1, 2)) 

for number in xrange(3, int(hi ** 0.5) + 1):
    if number not in numbers:
        #number must have been removed because it has a prime factor
        continue

    num = number
    while num < hi:
        num += number
        if num in numbers:
            # Remove multiples of prime found
            numbers.remove(num)

print 2 + sum(numbers)

You should find that this code runs in a a few seconds; it takes around 5 seconds on my 2GHz single-core machine.

You'll notice that I've moved the comments so that they're above the line they're commenting on. That's the preferred style in Python since we prefer short lines, and also inline comments tend to make the code look cluttered.

There's another small optimization that can be made to the inner while loop, but I let you figure that out for yourself. :)

A dense set is better represented with a bitmap (array of true or false), not a set that stores the actual integers, rather than a bool for each position. This is true whether it's a hash-based set, or a flat array like the OP used. — Peter Cordes, Sep 12 '15 at 22:48
@PeterCordes: Certainly! But the focus of this answer was to show how the speed of the OP's algorithm could be improved by using a data type that was more amenable to searching & removal. In my own Python prime sieves I tend to use lists of booleans, as you can see in the links in the comment to saulspatz's answer. — PM 2Ring, Sep 14 '15 at 01:30
FWIW, in my C prime sieves I use bitmaps that take advantage of the fact that all primes > 30 are congruent mod 30 to one of {1, 7, 11, 13, 17, 19, 23, 29}, so each byte in the bitmap handles a block of 30 integers. Addressing such a bitmap is slightly slower than a simple bitmap, but the space saving can make it worthwhile. — PM 2Ring, Sep 14 '15 at 01:31

score 1 · Answer 2 · answered Sep 12 '15 at 18:51

First, removing numbers from the list will be very slow. Instead of this, make a list

primes = primes_below_number * True
primes[0] = False
primes[1] = False

Now in your loop, when you find a prime p, change primes[k*p] to False for all suitable k. (You wouldn't actually do multiply, you'd continually add p, of course.)

At the end,

primes = [n for n i range(primes_below_number) if primes[n]]

This should be a great deal faster.

Second, you can stop looking once your find a prime greater than the square root of primes_below_number, since a composite number must have a prime factor that doesn't exceed its square root.

FWIW, this approach can be fairly fast if you use extended slice notation [like this](http://stackoverflow.com/a/26193172/4014959). And the process can be adapted to find [primes in a range](http://stackoverflow.com/a/26440674/4014959). — PM 2Ring, Sep 12 '15 at 19:10

score 0 · Answer 3 · answered Sep 12 '15 at 18:38

0

Try using numpy, should make it faster. Replace range by xrange, it may help you.

answered Sep 12 '15 at 18:38

Ricardo Silveira

25
7

OP takes the `len` of numbers so `xrange` would break that. – AChampion Sep 12 '15 at 18:43

mpcabd · Answer 4 · 2015-09-12T19:54:54.230

Here's an optimization for your code:

import itertools

primes_below_number = 2000000
numbers = list(range(3, primes_below_number, 2))
pos = 0

while pos < len(numbers) - 1:
    number = numbers[pos]
    numbers = list(
        itertools.chain(
            itertools.islice(numbers, 0, pos + 1),
            itertools.ifilter(
                lambda n: n % number != 0,
                itertools.islice(numbers, pos + 1, len(numbers))
            )
        )
    )
    pos += 1

sum_of_primes = sum(numbers) + 2
print sum_of_primes

The optimization here is because:

Removed the sum to outside the loop.
Instead of removing elements from a list we can just create another one, memory is not an issue here (I hope).
When creating the new list we create it by chaining two parts, the first part is everything before the current number (we already checked those), and the second part is everything after the current number but only if they are not divisible by the current number.
Using itertools can make things faster since we'd be using iterators instead of looping through the whole list more than once.

Another solution would be to not remove parts of the list but disable them like @saulspatz said.

And here's the fastest way I was able to find: http://www.wolframalpha.com/input/?i=sum+of+all+primes+below+2+million

Update

Here is the boolean method:

import itertools

primes_below_number = 2000000
numbers = [v % 2 != 0 for v in xrange(primes_below_number)]
numbers[0] = False
numbers[1] = False
numbers[2] = True

number = 3

while number < primes_below_number:
    n = number * 3  # We already excluded even numbers
    while n < primes_below_number:
        numbers[n] = False
        n += number
    number += 1
    while number < primes_below_number and not numbers[number]:
        number += 1

sum_of_numbers = sum(itertools.imap(lambda index_n: index_n[1] and index_n[0] or 0, enumerate(numbers)))

print(sum_of_numbers)

This executes in seconds (took 3 seconds on my 2.4GHz machine).

There's a simple optimization for this, you could do a slice assignment instead of your first inner loop: `numbers[2*number-1::number] = [False] * (primes_below_number//number-1)` and increment by 2 in your second inner loop, `number += 2`. On my machine this reduced the time 40%. — AChampion, Sep 13 '15 at 01:15

score 0 · Answer 5 · edited May 23 '17 at 12:09

Instead of storing a list of numbers, you can instead store an array of boolean values. This use of a bitmap can be thought of as a way to implement a set, which works well for dense sets (there aren't big gaps between the values of members).

An answer on a recent python sieve question uses this implementation python-style. It turns out a lot of people have implemented a sieve, or something they thought was a sieve, and then come on SO to ask why it was slow. :P Look at the related-questions sidebar from some of them if you want more reading material.

Finding the element that holds the boolean that says whether a number is in the set or not is easy and extremely fast. array[i] is a boolean value that's true if i is in the set, false if not. The memory address can be computed directly from i with a single addition.

(I'm glossing over the fact that an array of boolean might be stored with a whole byte for each element, rather than the more efficient implementation of using every single bit for a different element. Any decent sieve will use a bitmap.)

Removing a number from the set is as simple as setting array[i] = false, regardless of the previous value. No searching, not comparison, no tracking of what happened, just one memory operation. (Well, two for a bitmap: load the old byte, clear the correct bit, store it. Memory is byte-addressable, but not bit-addressable.)

An easy optimization of the bitmap-based sieve is to not even store the even-numbered bytes, because there is only one even prime, and we can special-case it to double our memory density. Then the membership-status of i is held in array[i/2]. (Dividing by powers of two is easy for computers. Other values are much slower.)

An SO question: Why is Sieve of Eratosthenes more efficient than the simple "dumb" algorithm? has many links to good stuff about the sieve. This one in particular has some good discussion about it, in words rather than just code. (Nevermind the fact that it's talking about a common Haskell implementation that looks like a sieve, but actually isn't. They call this the "unfaithful" sieve in their graphs, and so on.)

discussion on that question brought up the point that trial division may be fast than big sieves, for some uses, because clearing the bits for all multiples of every prime touches a lot of memory in a cache-unfriendly pattern. CPUs are much faster than memory these days.

Sum of primes below 2,000,000 in python

5 Answers5

Update

Linked

Related