0

I'm relatively new to the python world, and the coding world in general, so I'm not really sure how to go about optimizing my python script. The script that I have is as follows:

import math
z = 1
x = 0
while z != 0:
    x = x+1
    if x == 500:
        z = 0
    calculated = open('Prime_Numbers.txt', 'r')
    readlines = calculated.readlines()
    calculated.close()
    a = len(readlines)
    b = readlines[(a-1)]

    b = int(b) + 1
    for num in range(b, (b+1000)):
        prime = True
        calculated = open('Prime_Numbers.txt', 'r')
        for i in calculated:
            i = int(i)
            q = math.ceil(num/2)
            if (q%i==0):
                prime = False
        if prime:
            calculated.close()
            writeto = open('Prime_Numbers.txt', 'a')
            num = str(num)
            writeto.write("\n" + num)
            writeto.close()
            print(num)

As some of you can probably guess I'm calculating prime numbers. The external file that it calls on contains all the prime numbers between 2 and 20. The reason that I've got the while loop in there is that I wanted to be able to control how long it ran for.

If you have any suggestions for cutting out any clutter in there could you please respond and let me know, thanks.

  • 1
    If you want a general review of your code you might want to consider posting on http://codereview.stackexchange.com/ instead. –  Jan 02 '14 at 06:51
  • 1
    Optimizing the code for a poor algorithm won't speed it up by much. Do some more research and find a better algorithm. – Blender Jan 02 '14 at 06:55
  • 1
    [This](http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes) may be of interest... – Sinkingpoint Jan 02 '14 at 06:56
  • Or more generally [this](https://en.wikipedia.org/wiki/Generating_primes). –  Jan 02 '14 at 06:59
  • This has been tackled in Python a number of times. My own solution for calculating primes can be found at http://stackoverflow.com/questions/9301781/fastest-in-term-of-space-way-to-find-prime-numbers-with-python/9302299#9302299 I don't claim that it's the best. – Mark Ransom Jan 02 '14 at 08:04

3 Answers3

2

Reading and writing to files is very, very slow compared to operations with integers. Your algorithm can be sped up 100-fold by just ripping out all the file I/O:

import itertools

primes = {2}  # A set containing only 2

for n in itertools.count(3):  # Start counting from 3, by 1
    for prime in primes:      # For every prime less than n
        if n % prime == 0:    # If it divides n
            break             # Then n is composite
    else:
        primes.add(n)         # Otherwise, it is prime
        print(n)

A much faster prime-generating algorithm would be a sieve. Here's the Sieve of Eratosthenes, in Python 3:

end = int(input('Generate primes up to: '))
numbers = {n: True for n in range(2, end)}  # Assume every number is prime, and then

for n, is_prime in numbers.items():         # (Python 3 only)
    if not is_prime:
        continue                            # For every prime number

    for i in range(n ** 2, end, n):         # Cross off its multiples
        numbers[i] = False

    print(n)
Blender
  • 289,723
  • 53
  • 439
  • 496
  • Is there a reason you use a set instead of list or deque? –  Jan 02 '14 at 07:29
  • @Nabla: I'm just going off the mathematical algorithm for simplicity. – Blender Jan 02 '14 at 07:32
  • 1
    I just notice, that probably your implementation is not really the Sieve of Eratosthenes, because you cross-out the multiples of all numbers, not just primes. This is redundant. See also Mark Ransom's implementation. –  Jan 02 '14 at 09:18
  • marked your 2nd code as Python 3. In Python 2.7 it just produces all the numbers from 2 to end. -- I like your description of the sieve in the code comments, it is very succinct. – Will Ness Jan 03 '14 at 11:10
  • @WillNess: Which part is Python 3-only? Also, you don't need the `if` statement. The iterable returned by `range` won't yield any numbers if `n ** 2 >= end`. – Blender Jan 03 '14 at 12:11
  • You're right, `if` is not needed. My bad, sorry. (I'll leave it for you to edit, since you caught it). -- as for Python-3, I tested (on Ideone) exactly the same code in both environments. In [Python 2.7](http://ideone.com/pjA0Rp) all numbers from 2 to 100 are returned, [in Python 3](http://ideone.com/Amox8L) just primes. I guess in P 3 `numbers.items()` is lazy, and the changes *are* reflected - you do mutate the sequence iterated upon. In P 2 it is probably not lazy. Or something to that effect. – Will Ness Jan 03 '14 at 13:32
  • @WillNess: Ah, right. There's no nice way to do `numbers.iteritems()` for both at once, so I'll just let it be. – Blender Jan 03 '14 at 16:14
1

It is very inefficient to keep storing and loading all primes from a file. In general file access is very slow. Instead save the primes to a list or deque. For this initialize calculated = deque() and then simply add new primes with calculated.append(num). At the same time output your primes with print(num) and pipe the result to a file.

When you found out that num is not a prime, you do not have to keep checking all the other divisors. So break from the inner loop:

if q%i == 0:
    prime = False
    break

You do not need to go through all previous primes to check for a new prime. Since each non-prime needs to factorize into two integers, at least one of the factors has to be smaller or equal sqrt(num). So limit your search to these divisors.

Also the first part of your code irritates me.

z = 1
x = 0
while z != 0:
    x = x+1
    if x == 500:
        z = 0

This part seems to do the same as:

for x in range(500):

Also you limit with x to 500 primes, why don't you simply use a counter instead, that you increase if a prime is found and check for at the same time, breaking if the limit is reached? This would be more readable in my opinion.

In general you do not need to introduce a limit. You can simply abort the program at any point in time by hitting Ctrl+C.

However, as others already pointed out, your chosen algorithm will perform very poor for medium or large primes. There are more efficient algorithms to find prime numbers: https://en.wikipedia.org/wiki/Generating_primes, especially https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes.

0

You're writing a blank line to your file, which is making int() traceback. Also, I'm guessing you need to rstrip() off your newlines.

I'd suggest using two different files - one for initial values, and one for all values - initial and recently computed.

If you can keep your values in memory a while, that'd be a lot faster than going through a file repeatedly. But of course, this will limit the size of the primes you can compute, so for larger values you might return to the iterate-through-the-file method if you want.

For computing primes of modest size, a sieve is actually quite good, and worth a google.

When you get into larger primes, trial division by the first n primes is good, followed by m rounds of Miller-Rabin. If Miller-Rabin probabilistically indicates the number is probably a prime, then you do complete trial division or AKS or similar. Miller Rabin can say "This is probably a prime" or "this is definitely composite". AKS gives a definitive answer, but it's slower.

FWIW, I've got a bunch of prime-related code collected together at http://stromberg.dnsalias.org/~dstromberg/primes/

dstromberg
  • 6,954
  • 1
  • 26
  • 27
  • There is no blank line. `write` does not write a newline itself and the initial file is already populated as per statement by the questioner, presumably without newline at the end. –  Jan 02 '14 at 06:49
  • Checking to see if any of the the primes from 3 to floor(sqrt(n)) actually, but I dare not add more to the subject of the search for primes :P – Alec Teal Jan 02 '14 at 06:51
  • Also the trailing newline for every line is stripped automatically. –  Jan 02 '14 at 06:55
  • 1
    @dstromberg what blank line? There isn't any blank line. Here's the output from when I set it up to run overnight last night. [link](http://www.dropbox.com/s/gcwrj4sgckm9r3d/Prime_Numbers.txt) – Tschloken Jan 02 '14 at 06:58