3

I'm fairly new to programming and I decided to do some exercises to improve my abilities. I'm stuck with an exercise: "Find the sum of all the primes below two million." My code is too slow.

Initially, I tried to solve as a normal prime problem and ended up with this:

sum = 2 + 3
for i in range (5, 2000000, 2):
    for j in range (3, i, 2):
        if i%j == 0:
            break
    else:
        sum += i
print(sum)

In this way, all even numbers would be excluded from the loop. But it did not solve my problem. The magnitude here is really big.

So I tried to understand what was happening with this code. I have a loop inside a loop, and the loop inside the loop runs the index of the outside loop times (not quite that because the list doesn't start from 0), right? So, when I try to find prime numbers under 20, it runs the outside loop 8 times, but the inside loop 60 (I don't know if this math is correct, as I said, I'm quite knew to programming). But when I use it with 2,000,000, I'm running the inside loop something like 999,993,000,012 times in total, and that is madness.

My friend told me about the Sieve of Eratosthenes, and I tried to create a new code:

list = [2]
list.extend(range(3, 2000000, 2))
for i in list:
    for j in list:
        if j%i == 0 and j > i:
            list.remove(j)
print(sum(list))

And that's what I achieved trying to simulate the sieve (Ignoring even numbers helped). it's a lot faster (with the other code, it would take a long time to find primes under 200,000, and with this new one I can do it) but it is not enough to compute 2,000,000,000 in a reasonable time. The code is running in the background since I started to write, and still nothing. I don't know how many times this thing is looping and I am too tired to think about it now.

I came here to ask for help. Why is it so slow? What should I learn/read/do to improve my code? Is there any other method more efficient than this sieve? Thank you for your time.

1 Answers1

2

Because list.remove is a O(n) operation, and you're doing it a lot. And you're not performing a true sieve, just trial division in disguise; you're still doing all the remainder testing you did in the original code.

A Sieve of Eratosthenes typically is implemented with an array of flags; in the simplest form, each index corresponds to the same number, and the value is initially True for all indices but 0 and 1. You iterate along, and when you find a True value, you set all indices that are multiples of it to False. This means the work is sequential addition, not multiplication, not division (which are much more expensive.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • Could you explain a little more? – settifoglio Jan 22 '16 at 01:00
  • Okay, covering the first sieving case. As you iterate, you find that index 2 is true (prime). At this point, you know that all multiples of 2 are not prime by definition. So instead of any multiplication or division, you can just loop from `2 * 2` to `2000000`, stepping by `2` (e.g. using `range(p * p, len(flags), p)`), and set the entry in the flag list to `False`. No large memory moves are required (the flags that aren't multiples aren't even checked), no multiplication or remainder tests occur (and `xrange`/`range` does the addition more quickly). – ShadowRanger Jan 22 '16 at 01:09
  • `3` is the same deal; the flag is still true, so it's prime, and you'd be setting all flags from `3 * 3` to `2000000` stepping by `3` to `False` (actually, as an optimization, for all but `2`, you can step by `2 * p`, or `6` for `3` because the even numbered values are definitely not prime). – ShadowRanger Jan 22 '16 at 01:11
  • @settifoglio: If you want examples, [Rosetta code has multiple implementations in Python](http://rosettacode.org/wiki/Sieve_of_Eratosthenes#Python). They're not hyperoptimized (the "array" based version could be vastly improved on both speed and performance by using `bytearray` cleverly), but they're algorithmically dramatically superior to what you've got, and should be more than sufficient for sieving the first 2M numbers for primes (with hyperoptimized code, you could easily reduce the work by a factor of 10 or more, but shaving a second or two doesn't matter much here). – ShadowRanger Jan 22 '16 at 01:28
  • Thank you, sir. I will certainly try your advice and Rosetta. What you said makes a lot of sense, thank you very much. – settifoglio Jan 22 '16 at 03:02