how can I find which one is faster search a big file of prime numbers or check numbers one by one for prime numbers?

Question

I wrote some code in MATLAB to search for a pattern of prime numbers, with the isprime() function, but now I need do the same in python. There is, however, no built in isprime() function in python, now I want to know which one is faster:

1) search in a big file of prime numbers to find which number is a prime.

2) code a function to compute prime numbers.

it depends on how big is the file? also, you can never be sure unless you test. — sve, Oct 11 '15 at 20:34
This question is way too broad. We need a lot more information - how large is the file? How do you test a number for primality? Is the file sorted in any way? etc... — inspectorG4dget, Oct 11 '15 at 20:34
In addition to what @inspectorG4dget said: please add a [mcve] of your MATLAB code. — Adriaan, Oct 11 '15 at 20:36
To answer your question: depends on the size of the file and your programming capabilities. E.g. if you are a complete scrub, searching a file with 10.000 entries by hand might be faster than learning proper programming, whilst I would code it in ~5 minutes and not bother hand searching. — Adriaan, Oct 11 '15 at 20:38

score 1 · Answer 1 · edited May 23 '17 at 12:29

You should test, however to give you some idea of what to expect:

Time complexity of Sieve of Eratosthenes algorithm (using Wikipedia) states that the Sieve of Eratosthenes algorithm has runtime O(n(logn)(loglogn)). Reading from disk (assuming someone has already done the math) is going to be O(n) operations; in both cases, you should be able to store the primes in a python dict and then checking if a number is prime is O(1). So, in general (assuming you can get the list of primes), doing the read from disk is going to be faster.

Note that there are algorithms with can non-deterministically determine if a number is prime in fewer operations (but it sounds like you'd be calling that a lot to look for your patterns). This is what crypto uses (e.g. Miller–Rabin)

score 1 · Answer 2 · edited Oct 12 '15 at 00:08

The answer to this question depends largely how much memory you're willing to dedicate to either method, the range of primes you wish to search over and exactly how you'd compute/test your list of primes.

For the sake of comparison I will give two methods;

Method A which accepts a pre-generated list of the first million primes; (using a list of primes found here, with the introduction removed from the file; You could make this more efficient by making the format of the file more apt to parsing, but I'm not going to bother)

And method B using a combination of primality tests, a subset of the sieve of eratosthenes, using the first 250 primes to quickly eliminate composite numbers, the fermat primality test to increase the probability that a candidate prime is indeed prime, finally followed up with the miller-rabin primality test.

Method A: Using a precomputed list of primes

import time, re, random

methodAStartTime = time.time()
with open('primes1.txt') as primeFile:
    # Constructs a set of all primes in the file
    primes = {int(prime) for prime in re.findall("\S+", primeFile.read())}

    # Choose 250 random numbers to look up
    # 15485863 is the final prime in the file, and hence the upper limit of our test range
    toTest = [random.randrange(0, 15485863) for i in range(250)]
    foundPrimes = []

    methodATestStartTime = time.time()
    for candidate in toTest:
        if candidate in primes:
            foundPrimes.append(candidate)

    methodAStopTime = time.time()
    print foundPrimes
    print "Method A took %s seconds to construct the list of 1 million primes" % (methodAStopTime - methodAStartTime)
    print "Method A took %s seconds to do 250 lookups" % (methodAStopTime - methodATestStartTime)

The result:

$ python fileLookup.py
[675781, 4715407, 4440523, 14153297, 9673057, 13331299, 4195357, 7219307, 14390969, 1340237, 2875823, 13440881, 954677, 12005717, 11477101, 3783629, 7046069, 5098969, 11821031]
Method A took 1.08314299583 seconds to construct the list of 1 million primes
Method A took 9.79900360107e-05 seconds to do 250 lookups


$ python fileLookup.py
[3051749, 15035039, 7111123, 6855491, 5722303, 9594217, 14680751, 2542607, 4171003, 5353993, 1068131, 1637177, 195893, 617269, 2951497, 5937227, 14768029, 9201733, 4898497, 10333123]
Method A took 1.08329796791 seconds to construct the list of 1 million primes
Method A took 9.70363616943e-05 seconds to do 250 lookups

The upside of Method A is that once you construct your list of primes and enter them into a python set object, you essentially have incredibly fast guaranteed accuracy primality tests in O(1) time; the downside is that you first have to construct your list of primes, and you can only do primality tests on a certain range of candidate numbers.

However, if you want the ability to test arbitrarily large primes, the best way to do it is with a heuristic approach. Below is some code which takes a given number and tests it for primality (Though I'm fairly certain it works, I'd be hesitant to use my code elsewhere without thoroughly testing it first)

Method B: Using heuristic primality tests

import random, time

# Maximum number of checks to make when running the fermat primality test
FERMAT_ITERATIONS = 16

# Maximum number of checks to make when running the miller rabin primality test
MILLER_RABIN_ITERATIONS = 32

# List of the first 256 prime numbers for quickly eliminating composite numbers
SIEVE_PRIMES = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101,
                103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199,
                211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317,
                331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443,
                449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577,
                587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701,
                709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839,
                853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983,
                991, 997, 1009, 1013, 1019, 1021, 1031, 1033, 1039, 1049, 1051, 1061, 1063, 1069, 1087, 1091, 1093,
                1097, 1103, 1109, 1117, 1123, 1129, 1151, 1153, 1163, 1171, 1181, 1187, 1193, 1201, 1213, 1217, 1223,
                1229, 1231, 1237, 1249, 1259, 1277, 1279, 1283, 1289, 1291, 1297, 1301, 1303, 1307, 1319, 1321, 1327,
                1361, 1367, 1373, 1381, 1399, 1409, 1423, 1427, 1429, 1433, 1439, 1447, 1451, 1453, 1459, 1471, 1481,
                1483, 1487, 1489, 1493, 1499, 1511, 1523, 1531, 1543, 1549, 1553, 1559, 1567, 1571, 1579, 1583, 1597,
                1601, 1607, 1609, 1613, 1619)


def _subsetSievePrimalityTest(n):
    """
    Tests the divisibility of n using the first SIEVE_LENGTH prime numbers.

    Inputs: n - The value to test for primality, using SIEVE_PRIMES.

    Returns True if n is deemed to be a candidate prime.
    """

    for prime in SIEVE_PRIMES:
        if not n % prime and n != prime:
            return False
    return True


def _millerRabinPrimalityTest(n):
    """
    Runs the Miller-Rabin primality test to assert within probable means that n is a prime number.

    Inputs: n: The number to test for primality

    Note: Assumes n > 4.

    Returns True if n is a candidate prime, else False.
    """

    r = n - 1
    s = 0

    while not r & 1:
        r /= 2
        s += 1

    if not r:
        return False

    for i in range(MILLER_RABIN_ITERATIONS):
        randA = random.randint(2, n - 2)
        if pow(randA, r, n) != 1:
            for j in range(s + 1):
                if pow(randA, pow(2, j) * r, n) == n - 1:
                    break
            else:
                return False

    return True


def _fermatPrimalityTest(n):
    """
    Runs the Fermat Primality test to quickly discard non-prime candidates.

    Inputs: n: The number to test for primality

    Note: Assumes n > 2

    Returns True if n is a candidate prime, else False.
    """

    for i in range(FERMAT_ITERATIONS):
        randA = random.randint(2, n - 1)
        if pow(randA, n - 1, n) != 1:
            return False

    return True


def _testPrime(n):
    """ Tests if n is a prime number, to a high probability of accuracy. """

    # Run the subsetSievePrimalityTest to quickly eliminate most candidates
    if not _subsetSievePrimalityTest(n):
        return False

    # Run the fermat primality test to increase the probability that n is a prime
    if not _fermatPrimalityTest(n):
        return False

    # Run the miller-rabin primality test to increase the liklihood that n is a prime number.
    if not _millerRabinPrimalityTest(n):
        return False

    # If all the primality tests return True, n is very likely a prime.
    return True

# Perform 250 primality tests
primes = []
before = time.time()
for i in range(250):
    n = random.randrange(5, 10000000000000000000000000000000000000000000000000000000000000000000000)
    if _testPrime(n):
        primes.append(n)
print primes
print "250 tests took %s seconds." % (time.time() - before)

The results:

$ python primes.py
[1397851093443935893893199704415888634420504940995831429393995086101779L, 7170601962705551861928144990317179138094235542914256028587854974779083L]
250 tests took 0.0264971256256 seconds.

$ python primes.py
[8357905131314475918050404529147549250259829943945760080686207992488773L, 9373617169959281562405528688440072510316310402630308017260952694184939L, 5558837883632138346976786452679446510211389690872968737719419795450487L]
250 tests took 0.0443639755249 seconds.

The upside to method B is that you can test any number of any size at any time without needing to do any work beforehand; the downside is it's technically possible composite numbers may be incorrectly classified as prime numbers, and each individual lookup takes significantly longer than Method A.

At the end of the day, Method A is better if you know the range of primes you're looking for and intend on doing many many lookups. Method B is only better if you want to do tests on arbitrary numbers, or the range of numbers you wish to test is very, very large, and you wish to conserve both disk space, and memory.

score 0 · Answer 3 · answered Oct 12 '15 at 08:18

First of all: check if you really need a fast algorithm! Maybe you're using small enough numbers that the time difference compared to always using the Sieve of Eratosthenes will not even be noticeable.

If that's not the case, then there are basically three cases here:

If you have a small enough list in a file that you can read once and keep entirely in memory, you're going for an O(1) time.
If you have a very long list in a file and you can't keep it in memory, you'll have to binary search the file and that's O(log(n)) time.
If you don't have any list, well... take your time, because for big numbers that's going to be slow :-)

Maybe you could do something like keeping a small enough list in memory. If the number is too big, check in the file. If it' still too big, use a primality test to determine if it's prime.

After that, save the possibly newly found prime in the list for future reference.

Maybe also save it if it's not a prime number, otherwise for big numbers you'll have to do it all over again, just to find a negative result.

how can I find which one is faster search a big file of prime numbers or check numbers one by one for prime numbers?

3 Answers3

Method A: Using a precomputed list of primes

Method B: Using heuristic primality tests