Python 2 lists of positive integers finding prime number

Question

Given 2 lists of positive integers, find how many ways you can select a number from each of the lists such that their sum is a prime number.

My code is tooo slow As i have both list1 and list 2 containing 50000 numbers each. So any way to make it faster so it solves it in minutes instead of days?? :)

    # 2 is the only even prime number
    if n == 2: return True

    # all other even numbers are not primes
    if not n & 1: return False

    # range starts with 3 and only needs to go 
    # up the squareroot of n for all odd numbers
    for x in range(3, int(n**0.5)+1, 2):
        if n % x == 0: return False

    return True



for i2 in l2:
    for i1 in l1:
        if isprime(i1 + i2):
            n = n + 1 # increasing number of ways
            s = "{0:02d}: {1:d}".format(n, i1 + i2)
            print(s) # printing out

This is a job for [dynamic programming](http://en.wikipedia.org/wiki/Dynamic_programming). — zwol, Oct 12 '13 at 21:39
I don't see any way in which dynamic programming techniques could help here. There's no apparent sub-structure of any exploitable kind here - e.g., whether i1+j1 is prime is independent of whether i2+j2 is prime. — Tim Peters, Oct 13 '13 at 02:55

score 2 · Accepted Answer · edited May 23 '17 at 12:28

Sketch:

Following @Steve's advice, first figure out all the primes <= max(l1) + max(l2). Let's call that list primes. Note: primes doesn't really need to be a list; you could instead generate primes up the max one at a time.
Swap your lists (if necessary) so that l2 is the longest list. Then turn that into a set: l2 = set(l2).
Sort l1 (l1.sort()).

Then:

for p in primes:
    for i in l1:
        diff = p - i
        if diff < 0:
            # assuming there are no negative numbers in l2;
            # since l1 is sorted, all diffs at and beyond this
            # point will be negative
            break
        if diff in l2:
           # print whatever you like
           # at this point, p is a prime, and is the
           # sum of diff (from l2) and i (from l1)

Alas, if l2 is, for example:

l2 = [2, 3, 100000000000000000000000000000000000000000000000000]

this is impractical. It relies on that, as in your example, max(max(l1), max(l2)) is "reasonably small".

Fleshed out

Hmm! You said in a comment that the numbers in the lists are up to 5 digits long. So they're less than 100,000. And you said at the start that the list have 50,000 elements each. So they each contain about half of all possible integers under 100,000, and you're going to have a very large number of sums that are primes. That's all important if you want to micro-optimize ;-)

Anyway, since the maximum possible sum is less than 200,000, any way of sieving will be fast enough - it will be a trivial part of the runtime. Here's the rest of the code:

def primesum(xs, ys):
    if len(xs) > len(ys):
        xs, ys = ys, xs
    # Now xs is the shorter list.
    xs = sorted(xs)  # don't mutate the input list
    sum_limit = xs[-1] + max(ys)  # largest possible sum
    ys = set(ys)     # make lookups fast
    count = 0
    for p in gen_primes_through(sum_limit):
        for x in xs:
            diff = p - x
            if diff < 0:
                # Since xs is sorted, all diffs at and
                # beyond this point are negative too.
                # Since ys contains no negative integers,
                # no point continuing with this p.
                break
            if diff in ys:
                #print("%s + %s = prime %s" % (x, diff, p))
                count += 1
    return count

I'm not going to supply my gen_primes_through(), because it's irrelevant. Pick one from the other answers, or write your own.

Here's a convenient way to supply test cases:

from random import sample
xs = sample(range(100000), 50000)
ys = sample(range(100000), 50000)
print(primesum(xs, ys))

Note: I'm using Python 3. If you're using Python 2, use xrange() instead of range().

Across two runs, they each took about 3.5 minutes. That's what you asked for at the start ("minutes instead of days"). Python 2 would probably be faster. The counts returned were:

219,334,097

and

219,457,533

The total number of possible sums is, of course, 50000**2 == 2,500,000,000.

About timing

All the methods discussed here, including your original one, take time proportional to the product of two lists' lengths. All the fiddling is to reduce the constant factor. Here's a huge improvement over your original:

def primesum2(xs, ys):
    sum_limit = max(xs) + max(ys)  # largest possible sum
    count = 0
    primes = set(gen_primes_through(sum_limit))
    for i in xs:
        for j in ys:
            if i+j in primes:
                # print("%s + %s = prime %s" % (i, j, i+j))
                count += 1
    return count

Perhaps you'll understand that one better. Why is it a huge improvement? Because it replaces your expensive isprime(n) function with a blazing fast set lookup. It still takes time proportional to len(xs) * len(ys), but the "constant of proportionality" is slashed by replacing a very expensive inner-loop operation with a very cheap operation.

And, in fact, primesum2() is faster than my primesum() in many cases too. What makes primesum() faster in your specific case is that there are only around 18,000 primes less than 200,000. So iterating over the primes (as primesum() does) goes a lot faster than iterating over a list with 50,000 elements.

A "fast" general-purpose function for this problem would need to pick different methods depending on the inputs.

no I have a premade list of different numbers. i need find how many ways I can select a number from each of the lists such that their sum is a prime number. — user1111111111111, Oct 12 '13 at 22:02
@WinCd, yes, I understood that. You apparently don't understand, me, though :-( Sorry, but I'm out of time for now - try reading it again? And if it's still unclear, be very explicit about which part(s) you don't understand. — Tim Peters, Oct 12 '13 at 22:03
@WinCd, see my edit just now. Can't get clearer than showing the code in full ;-) — Tim Peters, Oct 13 '13 at 00:27
what the above posted code did in 20 hours your solution did in < 10 minutes. But thr is just one problem. In few cases like 1 out of 10 it gives incorrect answer. If you would like to chk the list with which it gave wrong answer pls msg me — user1111111111111, Oct 15 '13 at 00:32
Sorry, I don't know how to send a message to anyone here. What you "should do" to debug is have two different algorithms, and run them over many, many **small** inputs. When they disagree about a result, then you can easily investigate why. Nobody will be able to guess by staring at lists with 50,000 elements. — Tim Peters, Oct 15 '13 at 01:04

score 1 · Answer 2 · answered Oct 12 '13 at 21:42

You should use the Sieve of Eratosthenes to calculate prime numbers.

You are also calculating the prime numbers for each possible combination of sums. Instead, consider finding the maximum value you can achieve with the sum from the lists. Generate a list of all the prime numbers up to that maximum value.

Whilst you are adding up the numbers, you can see if the number appears in your prime number list or not.

hughdbrown · Answer 3 · 2013-10-12T23:09:54.070

I would find the highest number in each range. The range of primes is the sum of the highest numbers.

Here is code to sieve out primes:

def eras(n):
    last = n + 1
    sieve = [0, 0] + list(range(2, last))
    sqn = int(round(n ** 0.5))
    it = (i for i in xrange(2, sqn + 1) if sieve[i])
    for i in it:
        sieve[i * i:last:i] = [0] * (n // i - i + 1)
    return filter(None, sieve)

It takes around 3 seconds to find the primes up to 10 000 000. Then I would use the same n ^ 2 algorithm you are using for generating sums. I think there is an n logn algorithm but I can't come up with it.

It would look something like this:

from collections import defaultdict
possible = defaultdict(int)
for x in range1:
    for y in range2:
        possible[x + y] += 1

def eras(n):
    last = n + 1
    sieve = [0, 0] + list(range(2, last))
    sqn = int(round(n ** 0.5))
    it = (i for i in xrange(2, sqn + 1) if sieve[i])
    for i in it:
        sieve[i * i:last:i] = [0] * (n // i - i + 1)
    return filter(None, sieve)

n = max(possible.keys())
primes = eras(n)
possible_primes = set(possible.keys()).intersection(set(primes))

for p in possible_primes:
    print "{0}: {1} possible ways".format(p, possible[p])

My list contains random numbers not in series like 10,20,30 Its more like 199,250,986 — user1111111111111, Oct 12 '13 at 22:47
@Win Cd: That does not matter. Find the max in each range of numbers. Generate the possible sums (n-squared calculation). Generate the primes up to the maximum needed (sum of maximums in each range). Intersect the two sets. — hughdbrown, Oct 12 '13 at 22:57
@hughdbrown, an elegant approach! It runs about 8x slower than the one I posted on "his kind of" test data (two lists each containing a random sample of 50000 integers from range(100000)), but is faster than mine for less extreme cases. Have you gotten anywhere with your `n log n` intuitions? That would be *real* improvement :-) — Tim Peters, Oct 13 '13 at 03:24

Python 2 lists of positive integers finding prime number

3 Answers3

Fleshed out

About timing