Optimizing a primality test based on runtime in Python

Question

I'm pretty new to algorithms and runtimes, and I'm trying to optimise a bit of my code for a personal project.

import math
for num in range(0, 10000000000000000000000):
    if all((num**(num+1)+(num+1)**(num))%i!=0 for i in range(2,int(math.sqrt((num**(num+1)+(num+1)**(num))))+1)):
        print(num)

What can I do to speed this up? I know that num=80 should work but my code isn't getting past num=0, 1, 2 (it's not fast enough).

First I define my range, then I say if 'such-and-such' is prime from range 2 to sqrt(such-and-such) + 1, then return that number. Sqrt(n) + 1 is the minimum number of factors to test for the primality of n.

This is a primality test of sequence A051442

For another, pre-computing `n + 1` and reusing on the next iteration might give you a tiny boost. — Mad Physicist, Jan 05 '18 at 04:50
How do I stash the results? (Also the mod formula can't be simplified. Essentially $n^{n+1}+(n+1)^n$ — Just A Mathematician, Jan 05 '18 at 04:57
But you don't need to recompute the whole thing for every element in the `all` statement. You only need to take `f(n) % i` in the `all`, where `f(n)` is basically a constant. — Mad Physicist, Jan 05 '18 at 05:02
I'm not exactly sure what you mean, but it sounds promising! Could you elaborate? — Just A Mathematician, Jan 05 '18 at 05:04
Also I ran a test to see how far it was getting, and apparently it stops around 15 (unexpectedly). — Just A Mathematician, Jan 05 '18 at 05:10
Here is a useful reference: https://stackoverflow.com/q/36522167/2988730 — Mad Physicist, Jan 06 '18 at 03:54

Mad Physicist · Accepted Answer · 2018-01-06T04:24:37.837

You would probably get a minor boost from computing (num**(num+1)+(num+1)**(num)) only once per iteration instead of sqrt(num**(num+1)+(num+1)**(num)) times. As you can see, this will greatly reduce the constant factor in your complexity. It won't change the fundamental complexity because you still need to compute the remainder. Change

if all((num**(num+1)+(num+1)**(num))%i!=0 for i in range(2,int(math.sqrt((num**(num+1)+(num+1)**(num))))+1)):

to

k = num**(num+1)+(num+1)**(num)
if all(k%i for i in range(2,int(math.sqrt(k))+1)):

The != 0 is implicit in Python.

Update

All this is just trivial improvement to an extremely inefficieny algorithm. The biggest speedup I can think of is to reduce the check k % i to only prime i. For any composite i = a * b such that k % i == 0, it must be the case that k % a == 0 and k % b == 0 (if k is divisible by i, it must also be divisible by the factors of i).

I am assuming that you don't want to use any kind of pre-computed prime tables in your code. In that case, you can compute the table yourself. This will involve checking all the numbers up to a given sqrt(k) only once ever, instead of once per iteration of num, since we can stash the previously computed primes in say a list. That will effectively increase the lower limit of the range in your current all from 2 to the square root of the previous k.

Let's define a function to extend our set of primes using the seive of Eratosthenes:

from math import sqrt

def extend(primes, from_, to):
    """
    primes: a sequence containing prime numbers from 2 to `from - 1`, in order
    from_: the number to start checking with
    to: the number to end with (inclusive)
    """
    if not primes:
        primes.extend([2, 3])
        return
    for k in range(max(from_, 5), to + 1):
         s = int(sqrt(k))  # No need to compute this more than once per k
         for p in primes:
            if p > s:
                # Reached sqrt(k) -> short circuit success
                primes.append(k)
                break
            elif not k % p:
                # Found factor -> short circuit failure
                break

Now we can use this function to extend our list of primes at every iteration of the original loop. This allows us to check the divisibility of k only against the slowly growing list of primes, not against all numbers:

primes = []
prev = 0
for num in range(10000000000000000000000):
    k = num**(num + 1) + (num + 1)**num
    lim = int(sqrt(k)) + 1
    extend(primes, prev, lim)
    #print('Num={}, K={}, checking {}-{}, p={}'.format(num, k, prev, lim, primes), end='... ')
    if k <= 3 and k in primes or all(k % i for i in primes):
        print('{}: {} Prime!'.format(num, k))
    else:
        print('{}: {} Nope'.format(num, k))
    prev = lim + 1

I am not 100% sure that my extend function is optimal, but I am able to get to num == 13, k == 4731091158953433 in <10 minutes on my ridiculously old and slow laptop, so I guess it's not too bad. That means that the algorithm builds a complete table of primes up to ~7e7 in that time.

Update #2

A sort-of-but-not-really optimization you could do would be to check all(k % i for i in primes) before calling extend. This would save you a lot of cycles for numbers that have small prime factors, but would probably catch up to you later on, when you would end up having to compute all the primes up to some enormous number. Here is a sample of how you could do that:

primes = []
prev = 0
for num in range(10000000000000000000000):
    k = num**(num + 1) + (num + 1)**num
    lim = int(sqrt(k)) + 1
    if not all(k % i for i in primes):
        print('{}: {} Nope'.format(num, k))
        continue
    start = len(primes)
    extend(primes, prev, lim)
    if all(k % i for i in primes[start:]):
        print('{}: {} Prime!'.format(num, k))
    else:
        print('{}: {} Nope'.format(num, k))
    prev = lim + 1

While this version does not do much for the long run, it does explain why you were able to get to 15 so quickly in your original run. The prime table does note get extended after num == 3, until num == 16, which is when the terrible delay occurs in this version as well. The net runtime to 16 should be identical in both versions.

Update #3

As @paxdiablo suggests, the only numbers we need to consider in extend are multiples of 6 +/- 1. We can combine that with the fact that only a small number of primes generally need to be tested, and convert the functionality of extend into a generator that will only compute as many primes as absolutely necessary. Using Python's lazy generation should help. Here is a completely rewritten version:

from itertools import count
from math import ceil, sqrt

prime_table = [2, 3]

def prime_candidates(start=0):
    """
    Infinite generator of prime number candidates starting with the
    specified number.

    Candidates are 2, 3 and all numbers that are of the form 6n-1 and 6n+1
    """
    if start <= 3:
        if start <= 2:
            yield 2
        yield 3
        start = 5
        delta = 2
    else:
        m = start % 6
        if m < 2:
            start += 1 - m
            delta = 4
        else:
            start += 5 - m
            delta = 2
    while True:
        yield start
        start += delta
        delta = 6 - delta

def isprime(n):
    """
    Checks if `n` is prime.

    All primes up to sqrt(n) are expected to already be present in
    the generated `prime_table`.
    """
    s = int(ceil(sqrt(n)))
    for p in prime_table:
        if p > s:
            break
        if not n % p:
            return False
    return True

def generate_primes(max):
    """
    Generates primes up to the specified maximum.

    First the existing table is yielded. Then, the new primes are
    found in the sequence generated by `prime_candidates`. All verified
    primes are added to the existing cache.
    """
    for p in prime_table:
        if p > max:
            return
        yield p
    for k in prime_candidates(prime_table[-1] + 1):
        if isprime(k):
            prime_table.append(k)
            if k > max:
                # Putting the return here ensures that we always stop on a prime and therefore don't do any extra work
                return
            else:
                yield k

for num in count():
    k = num**(num + 1) + (num + 1)**num
    lim = int(ceil(sqrt(k)))
    b = all(k % i for i in generate_primes(lim))
    print('n={}, k={} is {}prime'.format(num, k, '' if b else 'not '))

This version gets to 15 almost instantly. It gets stuck at 16 because the smallest prime factor for k=343809097055019694337 is 573645313. Some future expectations:

17 should be a breeze: 16248996011806421522977 has factor 19
18 will take a while: 812362695653248917890473 has factor 22156214713
19 is easy: 42832853457545958193355601 is divisible by 3
20 also easy: 2375370429446951548637196401 is divisible by 58967
21: 138213776357206521921578463913 is divisible by 13
22: 8419259736788826438132968480177 is divisible by 103
etc... (link to sequence)

So in terms of instant gratification, this method will get you much further if you can make it past 18 (which will take >100 times longer than getting past 16, which in my case took ~1.25hrs).

That being said, your greatest speedup at this point would be re-writing this in C or some similar low-level language that does not have as much overhead for loops.

Update #4

Just for giggles, here is an implementation of the latest Python version in C. I chose to go with GMP for arbitrary precision integers, because it is easy to use and install on my Red Hat system, and the docs are very clear:

#include <stdio.h>
#include <stdlib.h>

#include <gmp.h>

typedef struct {
    size_t alloc;
    size_t size;
    mpz_t *numbers;
} PrimeTable;

void init_table(PrimeTable *buf)
{
    buf->alloc = 0x100000L;
    buf->size = 2;
    buf->numbers = malloc(buf->alloc * sizeof(mpz_t));
    if(buf == NULL) {
        fprintf(stderr, "No memory for prime table\n");
        exit(1);
    }
    mpz_init_set_si(buf->numbers[0], 2);
    mpz_init_set_si(buf->numbers[1], 3);
    return;
}

void append_table(PrimeTable *buf, mpz_t number)
{
    if(buf->size == buf->alloc) {
        size_t new = 2 * buf->alloc;
        mpz_t *tmp = realloc(buf->numbers, new * sizeof(mpz_t));
        if(tmp == NULL) {
            fprintf(stderr, "Ran out of memory for prime table\n");
            exit(1);
        }
        buf->alloc = new;
        buf->numbers = tmp;
    }
    mpz_set(buf->numbers[buf->size], number);
    buf->size++;
    return;
}

size_t print_table(PrimeTable *buf, FILE *file)
{
    size_t i, n;

    n = fprintf(file, "Array contents = [");
    for(i = 0; i < buf->size; i++) {
        n += mpz_out_str(file, 10, buf->numbers[i]);
        if(i < buf->size - 1)
            n += fprintf(file, ", ");
    }
    n += fprintf(file, "]\n");
    return n;
}

void free_table(PrimeTable *buf)
{
    for(buf->size--; ((signed)(buf->size)) >= 0; buf->size--)
        mpz_clear(buf->numbers[buf->size]);
    free(buf->numbers);
    return;
}

int isprime(mpz_t num, PrimeTable *table)
{
    mpz_t max, rem, next;
    size_t i, d, r;

    mpz_inits(max, rem, NULL);
    mpz_sqrtrem(max, rem, num);

    // Check if perfect square: definitely not prime
    if(!mpz_cmp_si(rem, 0)) {
        mpz_clears(rem, max, NULL);
        return 0;
    }

    /* Normal table lookup */
    for(i = 0; i < table->size; i++) {
        // Got to sqrt(n) -> prime
        if(mpz_cmp(max, table->numbers[i]) < 0) {
            mpz_clears(rem, max, NULL);
            return 1;
        }
        // Found a factor -> not prime
        if(mpz_divisible_p(num, table->numbers[i])) {
            mpz_clears(rem, max, NULL);
            return 0;
        }
    }

    /* Extend table and do lookup */
    // Start with last found prime + 2
    mpz_init_set(next, table->numbers[i - 1]);
    mpz_add_ui(next, next, 2);
    // Find nearest number of form 6n-1 or 6n+1
    r = mpz_fdiv_ui(next, 6);
    if(r < 2) {
        mpz_add_ui(next, next, 1 - r);
        d = 4;
    } else {
        mpz_add_ui(next, next, 5 - r);
        d = 2;
    }
    // Step along numbers of form 6n-1/6n+1. Check each candidate for
    // primality. Don't stop until next prime after sqrt(n) to avoid
    // duplication.
    for(;;) {
        if(isprime(next, table)) {
            append_table(table, next);
            if(mpz_divisible_p(num, next)) {
                mpz_clears(next, rem, max, NULL);
                return 0;
            }
            if(mpz_cmp(max, next) <= 0) {
                mpz_clears(next, rem, max, NULL);
                return 1;
            }
        }
        mpz_add_ui(next, next, d);
        d = 6 - d;
    }

    // Return can only happen from within loop.
}

int main(int argc, char *argv[])
{
    PrimeTable table;
    mpz_t k, a, b;
    size_t n, next;
    int p;

    init_table(&table);
    mpz_inits(k, a, b, NULL);
    for(n = 0; ; n = next) {
        next = n + 1;
        mpz_set_ui(a, n);
        mpz_pow_ui(a, a, next);
        mpz_set_ui(b, next);
        mpz_pow_ui(b, b, n);
        mpz_add(k, a, b);
        p = isprime(k, &table);
        printf("n=%ld k=", n);
        mpz_out_str(stdout, 10, k);
        printf(" p=%d\n", p);
        //print_table(&table, stdout);
    }
    mpz_clears(b, a, k, NULL);
    free_table(&table);
    return 0;
}

While this version has the exact same algorithmic complexity as the Python one, I expect it to run a few orders of magnitude faster because of the relatively minimal overhead incurred in C. And indeed, it took about 15 minutes to get stuck at n == 18, which is ~5 times faster than the Python version so far.

Update #5

This is going to be the last one, I promise.

GMP has a function called mpz_nextprime, which offers a potentially much faster implementation of this algorightm, especially with caching. According to the docs:

This function uses a probabilistic algorithm to identify primes. For practical purposes it’s adequate, the chance of a composite passing will be extremely small.

This means that it is probably much faster than the current prime generator I implemented, with a slight cost offset of some false primes being added to the cache. This cost should be minimal: even adding a few thousand extra modulo operations should be fine if the prime generator is faster than it is now.

The only part that needs to be replaced/modified is the portion of isprime below the comment /* Extend table and do lookup */. Basically that whole section just becomes a series of calls to mpz_nextprime instead of recursion.

At that point, you may as well adapt isprime to use mpz_probab_prime_p when possible. You only need to check for sure if the result of mpz_probab_prime_p is uncertain:

int isprime(mpz_t num, PrimeTable *table)
{
    mpz_t max, rem, next;
    size_t i, r;
    int status;

    status = mpz_probab_prime_p(num, 50);
    // Status = 2 -> definite yes, Status = 0 -> definite no
    if(status != 1)
        return status != 0;

    mpz_inits(max, rem, NULL);
    mpz_sqrtrem(max, rem, num);

    // Check if perfect square: definitely not prime
    if(!mpz_cmp_si(rem, 0)) {
        mpz_clears(rem, max, NULL);
        return 0;
    }

    mpz_clear(rem);

    /* Normal table lookup */
    for(i = 0; i < table->size; i++) {
        // Got to sqrt(n) -> prime
        if(mpz_cmp(max, table->numbers[i]) < 0) {
            mpz_clear(max);
            return 1;
        }
        // Found a factor -> not prime
        if(mpz_divisible_p(num, table->numbers[i])) {
            mpz_clear(max);
            return 0;
        }
    }

    /* Extend table and do lookup */
    // Start with last found prime + 2
    mpz_init_set(next, table->numbers[i - 1]);
    mpz_add_ui(next, next, 2);

    // Step along probable primes
    for(;;) {
        mpz_nextprime(next, next);
        append_table(table, next);
        if(mpz_divisible_p(num, next)) {
            r = 0;
            break;
        }
        if(mpz_cmp(max, next) <= 0) {
            r = 1;
            break;
        }
    }

    mpz_clears(next, max, NULL);
    return r;
}

Sure enough, this version makes it to n == 79 in a couple of seconds at most. It appears to get stuck on n == 80, probably because mpz_probab_prime_p can't determine if k is a prime for sure. I doubt that computing all the primes up to ~10^80 is going to take a trivial amount of time.

Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackoverflow.com/rooms/162605/discussion-on-answer-by-mad-physicist-optimizing-a-primality-test). — Andy, Jan 05 '18 at 14:20
@JustAMathematician. Yeah, I just thought of another optimization. Adding that in now. — Mad Physicist, Jan 05 '18 at 16:00
@JustAMathematician. I have my best version so far up. I will probably try to make a C implementation too at this point just to C what happens. — Mad Physicist, Jan 05 '18 at 16:51
Brilliant! `prime_candidates` is especially creative. I like the idea of testing `6n\pm 1`. Running now, will respond after a few. — Just A Mathematician, Jan 05 '18 at 20:54
@JustAMathematician. I've been running since 11am (EST). Still didn't get past 18. Don't expect to for about 4 more days. — Mad Physicist, Jan 05 '18 at 21:25
@JustAMathematician. I added a C implementation. If you can get libgmp, you should be able to compile and run it with something like `gcc primes.c -lgmp; ./a.out` — Mad Physicist, Jan 05 '18 at 21:41
@JustAMathematician. I'm running both in paralell, hopefully not on the same core, overnight. I expect the C version will have gotten well into the 20's by tomorrow morning. I probably won't check this computer til Monday though, so it will be interesting to see the results. — Mad Physicist, Jan 05 '18 at 21:45
I intend to keep the python program running. Will it hinder other work? — Just A Mathematician, Jan 05 '18 at 23:19
Interesting. I must say, I am surprised that C is faster than Python. I don't have experience with C, but from my experience with Python -- a 'mathematical language,' -- I would expect algorithmic runtimes to be top-tier. I assume Java would be even worse. — Just A Mathematician, Jan 05 '18 at 23:29
Python is a mathematical language because it has a lot of third party packages like numpy and scipy available to do fast vectorized computations. The thing is that most of the fast code is implemented in C and Fortran. Like most high level interpreted languages, Python statements have a large amount of overhead. C on the other hand gives you very low level access to memory and processing power. Along with a good compiler, that makes it much faster. — Mad Physicist, Jan 06 '18 at 02:34
Java is like that too. It's running in a VM, which means overhead. It has an interface to native code, like Python does, so you can write fast math libraries for it too. Java is not as tailored as much for scientific applications for historic reasons, and the are fewer numpy type libraries that I'm aware of. The C interface is also a lot more awkward than Python's in my opinion. — Mad Physicist, Jan 06 '18 at 02:44
@JustAMathematician. I've added an update of a version that makes it to n=79 in about a second (in C), but I am not sure how many library routines you are OK with. I also don't know how to implement it in Python without installing some external packages. — Mad Physicist, Jan 06 '18 at 04:17
I would be fine installing external packages (after all, if that's what it takes!) but I don't know how. I tried with numpy and similar but they didn't work. I will download NetBeans soon and try to run the C code. — Just A Mathematician, Jan 06 '18 at 04:20
If you are using Python, get Anaconda. It will make your life much easier: https://www.anaconda.com/download/. It's a Python platform manager that lets you install things without any admin privileges, set up multiple versions of Python and environments with different sets of libraries. But you don't have to use the fancy stuff if you don't want to. You can just run `pip` to install stuff as usual, and it will work fine too. — Mad Physicist, Jan 06 '18 at 04:26

Optimizing a primality test based on runtime in Python

1 Answers1