4

I'm working on something that requires me to generate all prime numbers up to 10^12.

Since I have never needed this many primes before, I normally just implement the algorithm on this webpage here

The problem here is, of course, that 10^12 is larger than the max value for an integer, and as a result I cannot make an array of that size.

I am unfamiliar with the methods one would use to generate so many primes efficiently, and was wondering if anyone could shed some light on the situation.

Thirumalai murugan
  • 5,698
  • 8
  • 32
  • 54
Tim
  • 143
  • 1
  • 3
  • 2
    You don't need an array of all the numbers, just the primes. Also, since `10¹² > Integer.MAX_VALUE`, use `Long` instead. – Matt Ball Jul 06 '13 at 15:44
  • @MattBall How do you sieve without all the numbers? – zw324 Jul 06 '13 at 15:46
  • @MattBall Unfortunately, Java does not allow arrays to have more than Integer.MAX_VALUE elements, and I don't think that is enough even for the expected number of primes. – Patricia Shanahan Jul 06 '13 at 15:53
  • 1
    @PatriciaShanahan If Java had no such limit, I can hardly imagine fitting an array of 10^12 `long`s into RAM. – Marko Topolnik Jul 06 '13 at 16:06
  • @MarkoTopolnik Indeed. Even the array of primes would be a stretch. – Patricia Shanahan Jul 06 '13 at 16:52
  • You don't need `long`, one bit per candidate prime number is enough. If you use wheel factorization with 2, 3 and 5 then you have 8 candidates for every 30 numbers, so about 33GB would suffice. – starblue Jul 06 '13 at 17:36
  • What are you working on? Perhaps there is some other way to solve the problem rather than generating so many primes. You are talking about 37607912018 (that's 37.6 billion) primes. – user448810 Jul 06 '13 at 20:51

3 Answers3

10

You will need to use a segmented sieve.

The basic idea of a segmented sieve is to choose the sieving primes less than the square root of n, choose a reasonably large segment size that nevertheless fits in memory, and then sieve each of the segments in turn, starting with the smallest. At the first segment, the smallest multiple of each sieving prime that is within the segment is calculated, then multiples of the sieving prime are marked as composite in the normal way; when all the sieving primes have been used, the remaining unmarked numbers in the segment are prime. Then, for the next segment, for each sieving prime you already know the first multiple in the current segment (it was the multiple that ended the sieving for that prime in the prior segment), so you sieve on each sieving prime, and so on until you are finished.

Consider the example of sieving from 100 to 200 in segments of 20; the 5 sieving primes are 3, 5, 7, 11 and 13. In the first segment from 100 to 120, the bitarray has 10 slots, with slot 0 corresponding to 101, slot k corresponding to 100 + 2k + 1, and slot 9 corresponding to 119. The smallest multiple of 3 in the segment is 105, corresponding to slot 2; slots 2+3=5 and 5+3=8 are also multiples of 3. The smallest multiple of 5 is 105 at slot 2, and slot 2+5=7 is also a multiple of 5. The smallest multiple of 7 is 105 at slot 2, and slot 2+7=9 is also a multiple of 7. And so on.

Function primes takes arguments lo, hi and delta; lo and hi must be even, with lo < hi, and lo must be greater than the square root of hi. The segment size is twice delta. Array ps of length m contains the sieving primes less than the square root of hi, with 2 removed since even numbers are ignored, calculated by the normal Sieve of Eratosthenes. Array qs contains the offset into the sieve bitarray of the smallest multiple in the current segment of the corresponding sieving prime. After each segment, lo advances by twice delta, so the number corresponding to an index i of the sieve bitarray is lo + 2 i + 1.

function primes(lo, hi, delta)
    sieve := makeArray(0..delta-1)
    ps := tail(primes(sqrt(hi)))
    m := length(ps)
    qs := makeArray(0..m-1)
    for i from 0 to m-1
        qs[i] := (-1/2 * (lo + ps[i] + 1)) % ps[i]
    while lo < hi
        for i from 0 to delta-1
            sieve[i] := True
        for i from 0 to m-1
            for j from qs[i] to delta step ps[i]
                sieve[j] := False
            qs[i] := (qs[i] - delta) % ps[i]
        for i from 0 to delta-1
            t := lo + 2*i + 1
            if sieve[i] and t < hi
                output t
        lo := lo + 2*delta

For the sample given above, this is called as primes(100, 200, 10). In the example given above, qs is initially [2,2,2,10,8], corresponding to smallest multiples 105, 105, 105, 121 and 117, and is reset for the second segment to [1,2,6,0,11], corresponding to smallest multiples 123, 125, 133, 121 and 143.

The value of delta is critical; you should make delta as large as possible so long at it fits in cache memory, for speed. Use your language's library for the bitarray, so that you only take a single bit for each sieve location. If you need a simple Sieve of Eratosthenes to calculate the sieving primes, this is my favorite:

function primes(n)
    sieve := makeArray(2..n, True)
    for p from 2 to n step 1
        if sieve(p)
            output p
            for i from p * p to n step p
                sieve[i] := False

Those functions are both in pseudocode; you'll have to translate to Java, with appropriate integer data types. Where the pseudocode says output you can print the prime, or collect the primes in an array, whatever you want to do with them.

I've done a lot of work with primes at my blog, including the essay Programming with Prime Numbers which includes a segmented sieve on the last page.

user448810
  • 17,381
  • 4
  • 34
  • 59
0

The real solution is to find another way to solve the underlying problem, without generating the complete set of primes. According to the Prime Number Theorem the average gap between the primes is ln(1e12), about 27.6. That gives an estimate of over 39e9 primes less than 1e12.

You probably do not need all of them. Consider investigating ways of generating probable primes and/or prime number testing. Of course, it is impossible to know exactly what to do without knowing the underlying problem you are trying to solve.

Patricia Shanahan
  • 25,849
  • 4
  • 38
  • 75
0

Here is my Java code for computing segments of primes:

/**
 * Computes the primes in a range using the sieve of Eratosthenes.
 * The size of the range must not exceed Integer.MAX_VALUE.
 *
 * @param start  The start index of the prime sieve.
 * @param limit  Primes will be sieved up to but not including this limit.
 *
 * @return  A bit set representing the integer range from start to limit.
 *     Each bit in this set is set to true if and only if
 *     the corresponding integer is prime.
 */
public static BitSet computePrimes(long start, long limit)
{
    if (limit - start > Integer.MAX_VALUE)
    {
        throw new IllegalArgumentException();
    }

    final long sqrtLimit = sqrtCeil(limit);
    final BitSet primes = computePrimes((int) sqrtLimit);

    final BitSet segment = new BitSet();
    if (0 - start >= 0)
    {
        segment.set((int) (0 - start), false);
    }
    if (1 - start >= 0)
    {
        segment.set((int) (1 - start), false);
    }
    segment.set((int) (Math.max(0, 2 - start)), (int) (limit - start), true);
    for (int d = 2; d < sqrtLimit; d++)
    {
        if (primes.get(d))
        {
            final int remainder = (int) (start % d);
            final long mStart = start - remainder + (remainder == 0 ? 0 : d);
            for (long m = Math.max(mStart, d * d); m < limit; m += d)
            {
                segment.clear((int) (m - start));
            }
        }
    }
    return segment;
}

It needs a standard sieve to compute the primes for sieving segments (it computes it anew for every segment, you should change that):

/**
 * Computes the primes using the sieve of Eratosthenes.
 *
 * @param limit  Primes will be sieved up to but not including this limit.
 *
 * @return  A bit set where exactly the elements with prime index
 *     are set to true.
 */
public static BitSet computePrimes(int limit)
{
    final BitSet primes = new BitSet();
    primes.set(0, false);
    primes.set(1, false);
    primes.set(2, limit, true);
    for (int d = 2; d < sqrtCeil(limit); d++)
    {
        if (primes.get(d))
        {
            for (int m = d * d; m < limit; m += d)
            {
                primes.clear(m);
            }
        }
    }
    return primes;
}

Note that wheel factorization can speed this up by a factor of three. See also this answer, the basic sieve is the same.

Community
  • 1
  • 1
starblue
  • 55,348
  • 14
  • 97
  • 151