Finding the Nth Twin Prime

Question

I was trying to solve a problem on SPOJ. We are required to calculate the nth twin prime pair( primes differing by 2). n can be as large as 10^5. I tried a precalculation using a sieve, I had to sieve up to 10^8 to get the maximum n twin prime, but the time limit is strict(2s) and it times out. I noticed people have solved it in 0.00 seconds, so i looked around for a formula on google, and couldnt get anything helpful. Could someone please guide me?

Thanks in advance!!

Which sieve were you using? Sieve of Atkin should do it within time limits I think. — phimuemue, Apr 13 '12 at 15:10
If there was a closed formula for the `nth` twin prime, we'd know whether there is a finite or infinite number of them too. That in itself is probably worth a Fields medal. — biziclop, Apr 13 '12 at 15:12
I used the sieve of Eratosthenes. I havent tried the Atkin Sieve. Will get back to you on that. Thanks.. — frodo, Apr 13 '12 at 15:21
@frodo I tried an implementation of mine which runs not under 2s (when sieving up to 10^8), but I would be interested how you solved your problem. — phimuemue, Apr 13 '12 at 15:23
[100000th twin prime pair](http://www.wolframalpha.com/input/?i=100000+th+twin+prime+pair) is less than 2*10^7. That's 1/5th of your sieve size. — Will Ness, Apr 15 '12 at 23:54
yep, plain vector sieve of E. on 10,000,000 odds with integrated twin primes tracking (and memoization) completed in 1.05 sec on SPOJ, 0.18 sec on Ideone. About the 0.0 sec solutions, the code size limit on this one is 50 MB, maybe the full list of twin primes just fits in there. — Will Ness, Apr 16 '12 at 02:03
(no, that's 50 **KB** ; around 1MB of code is needed to write down the pre-calculated lowest primes of each twin pair - so it's a question of devising a compression to fit it into the 50KB source code limit). — Will Ness, Apr 16 '12 at 07:24
@WillNess: interestingly, Wolfram Alpha seems off by one for this one. It returns the same pair `10196267 10196269` for `59999 th twin prime pair` and `60000 th twin prime pair`... — chqrlie, Sep 19 '16 at 18:28
@chqrlie yes, strange indeed. well, here the point was to find out just the order of magnitude... (i.e. no need for precision). — Will Ness, Sep 19 '16 at 20:10
@WillNess: no need for precision indeed, but if WA can be off by one, how can you tell if it is not off by 1 billion? I reported the bug, and I am curious to see their reaction. — chqrlie, Sep 19 '16 at 20:42
@chqrlie if I get ACCEPT from SPOJ that's my confirmation. :) :) — Will Ness, Sep 20 '16 at 09:22

Priyank Bhatnagar · Answer 1 · 2012-04-17T10:15:20.447

I have got AC in 0.66s. As, there are solutions with 0.0s I assume better optimizations are possible, however, I describe my approach here.

I have used one basic optimization in Sieve of Eratosthenes. You know that 2 is the only even prime, using this you can reduce your computation time and memory for calculating primes by half.

Secondly, all the numbers which are twin primes will not be multiples of 2 and 3 (as they are primes!). So, those numbers will be of the form 6N+1 and 6N+5 (rest will not be primes for sure). 6N+5 = 6N+6-1 = 6(N+1)-1. So it can be seen that 6N+1 and 6N-1 can possibly be twin primes for N >= 1. So, you precompute all these values using the primes that you have calculated before. (Trivial case is 3 5)

Note: You don't need to calculate primes till 10^8, the upper limit is much lower. [Edit: I can share my code if you want, but it would be better if you come up with a solution on your own. :)]

their FAQ says 0.00 solutions are bugs. (also, you've got a typo, `6N+6-1 = 6(N **+** 1)-1`). — Will Ness, Apr 17 '12 at 09:45

Daniel Fischer · Answer 2 · 2012-04-16T08:08:50.527

Out of curiosity, I solved the problem using two variants of a Sieve of Eratosthenes. The first variant completed on the testing machine in 0.93s and the second in 0.24s. For comparison, on my computer, the first finished in 0.08s and the second in 0.04s.

The first was a standard sieve on the odd numbers, the second a slightly more elaborate sieve omitting also the multiples of 3 in addition to the even numbers.

The testing machines of SPOJ are old and slow, so a programme runs much longer on them than on a typical recent box; and they have small caches, therefore it is important to keep the computation small.

Doing that, a Sieve of Eratosthenes is easily fast enough. However, it is really important to keep memory usage small. The first variant, using one byte per number, gave "Time limit exceeded" on SPOJ, but ran in 0.12s on my box. So, given the characteristics of the SPOJ testing machines, use a bit-sieve to solve it in the given time.

On the SPOJ machine, I got a significant speedup (running time 0.14s) by further reducing the space of the sieve by half. Since - except for the first pair (3,5) - all prime twins have the form (6*k-1, 6*k+1), and you need not know which of the two numbers is composite if k doesn't give rise to a twin prime pair, it is sufficient to sieve only the indices k.

(6*k + 1 is divisible by 5 if and only if k = 5*m + 4 for some m, and 6*k - 1 is divisible by 5 if and only if k = 5*m+1 for some m, so 5 would mark off 5*m ± 1, m >= 1 as not giving rise to twin primes. Similarly, 6*k+1 is divisible by 13 if and only if k = 13*m + 2 for some m and 6*k - 1 if and only if k = 13*m - 2 for some m, so 13 would mark off 13*m ± 2.)

This doesn't change the number of markings, so with a sufficiently large cache, the change in running time is small, but for small caches, it's a significant speedup.

One more thing, though. Your limit of 10⁸ is way too high. I used a lower limit (20 million) that doesn't overestimate the 100,000^th twin prime pair by so much. With a limit of 10⁸, the first variant would certainly not have finished in time, the second probably not.

With the reduced limit, a Sieve of Atkin needs to be somewhat optimised to beat the Eratosthenes variant omitting even numbers and multiples of 3, a naive implementation will be significantly slower.

Some remarks concerning your (wikipedia's pseudocode) Atkin sieve:

#define limit 100000000
int prime1[MAXN];
int prime2[MAXN];

You don't need the second array, the larger partner of a prime twin pair can easily be computed from the smaller. You're wasting space and destroy cache locality reading from two arrays. (That's minor compared to the time needed for sieving, though.)

    int root = ceil(sqrt(limit));
    bool sieve[limit];

On many operating systems nowadays, that is an instant segfault, even with a reduced limit. The stack size is often limited to 8MB or less. Arrays of that size should be allocated on the heap.

As mentioned above, using one bool per number makes the programme run far slower than necessary. You should use a std::bitset or std::vector<bool> or twiddle the bits yourself. Also it is advisable to omit at least the even numbers.

    for (int x = 1; x <= root; x++)
    {
        for (int y = 1; y <= root; y++)
        {
//Main part of Sieve of Atkin
            int n = (4*x*x)+(y*y);
            if (n <= limit && (n % 12 == 1 || n % 12 == 5)) sieve[n] ^= true;
            n = (3*x*x)+(y*y);
            if (n <= limit && n % 12 == 7) sieve[n] ^= true;
            n = (3*x*x)-(y*y);
            if (x > y && n <= limit && n % 12 == 11) sieve[n] ^= true;
        }
    }

This is horribly inefficient. It tries far too many x-y-combinations, for each combination it does three or four divisions to check the remainder modulo 12 and it hops back and forth in the array.

Separate the different quadratics.

For 4*x^2 + y^2, it is evident that you need only consider x < sqrt(limit)/2 and odd y. Then the remainder modulo 12 is 1, 5, or 9. If the remainder is 9, then 4*x^2 + y^2 is actually a multiple of 9, so such a number would be eliminated as not square-free. However, it is preferable to omit the multiples of 3 from the sieve altogether and treat the cases n % 12 == 1 and n % 12 == 5 separately.

For 3*x^2 + y^2, it is evident that you need only consider x < sqrt(limit/3) and a little bit of thought reveals that x must be odd and y even (and not divisible by 3).

For 3*x^2 - y^2 with y < x, it is evident that you need only consider y < sqrt(limit/2). Looking at the remainders modulo 12, you see that y mustn't be divisible by 3 and x and y must have different parity.

When I tried it on Ideone.com once, for some reason the [plain vector on odds](http://ideone.com/fapob)-based SoE was *better* than *bitset* memory-wise, with ~ same speed for under 32 mln and even faster than it for bigger sieve sizes. Finding the n-th twin pair can be incorporated right into the sieve itself quite easily. — Will Ness, Apr 16 '12 at 00:33
Ah, yeah, `vector`. That is actually - on most implementations at least, don't know whether the standard prescribes it - a bit vector, like `UArray Int Bool`. I'm a C guy (when I'm not a Haskell guy), I always use raw arrays, it's much simpler. — Daniel Fischer, Apr 16 '12 at 08:07

Will Ness · Accepted Answer · 2012-07-27T15:38:01.603

So basically, sieving up to 20,000,000 is enough, according to Wolfram Alpha. Use plain sieve of Eratosthenes, on odds, with vector<bool> in C++ (what language were you using BTW?).

Track the twin primes right inside the sieve loop. Store the lower prime of a pair in a separate vector as you find the twins, and if an out-of-order (smaller then previous) index is requested (and they are, contrary to the examples shown on the description page), just get the prime from this storage:

size_t n = 10000000, itop=2236;
vector<bool> s;
vector<int> twins;
s.resize(n, true);
int cnt, k1, k2, p1=3, p2, k=0;
cin >> cnt;
if( cnt-- > 0 )
{
    cin >> k1;
    for( size_t i=1; i < n; ++i )  // p=2i+1
    {
        if( s[i] )
        {
            p2 = 2*i+1;
            if( p2-p1 == 2 ) { ++k; twins.push_back(p1); }
            if( k==k1 )
            { 
                cout << p1 << " " << p2 << endl;
                ......

etc. Got accept with 1.05 sec (0.18 sec on Ideone). Or untangle the logic - just pre-calculate 100,000 twin prime pairs right away, and access them in a separate loop afterwards (0.94 sec).

score 1 · Answer 4 · answered Apr 18 '12 at 12:00

1

A description of an efficient algorithm to solve this can be found here @ Programming Praxis entry Also, Scheme and Perl sample code are provided.

answered Apr 18 '12 at 12:00

evandrix

6,041
4
27
38

score 0 · Answer 5 · answered Apr 13 '12 at 18:15

I precomputed a large list of primes using the Sieve of Eratosthenes, then iterated through the list counting items that were 2 less than their successor until finding n of them. Runs in 1.42 seconds at http://ideone.com/vYjuC. I too would like to know how to compute the answer in zero seconds.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define ISBITSET(x, i) (( x[i>>3] & (1<<(i&7)) ) != 0)
#define SETBIT(x, i) x[i>>3] |= (1<<(i&7));
#define CLEARBIT(x, i) x[i>>3] &= (1<<(i&7)) ^ 0xFF;

typedef struct list {
    int data;
    struct list *next;
} List;

List *insert(int data, List *next)
{
    List *new;

    new = malloc(sizeof(List));
    new->data = data;
    new->next = next;
    return new;
}

List *reverse(List *list) {
    List *new = NULL;
    List *next;

    while (list != NULL)
    {
        next = list->next;
        list->next = new;
        new = list;
        list = next;
    }

    return new;
}

int length(List *xs)
{
    int len = 0;
    while (xs != NULL)
    {
        len += 1;
        xs = xs->next;
    }
    return len;
}

List *primes(int n)
{
    int m = (n-1) / 2;
    char b[m/8+1];
    int i = 0;
    int p = 3;
    List *ps = NULL;
    int j;

    ps = insert(2, ps);

    memset(b, 255, sizeof(b));

    while (p*p < n)
    {
        if (ISBITSET(b,i))
        {
            ps = insert(p, ps);
            j = (p*p - 3) / 2;
            while (j < m)
            {
                CLEARBIT(b, j);
                j += p;
            }
        }
        i += 1; p += 2;
    }

    while (i < m)
    {
        if (ISBITSET(b,i))
        {
            ps = insert(p, ps);
        }
        i += 1; p += 2;
    }

    return reverse(ps);
}

int nth_twin(int n, List *ps)
{
    while (ps->next != NULL)
    {
        if (n == 0)
        {
            return ps->data - 1;
        }

        if (ps->next->data - ps->data == 2)
        {
            --n;
        }

        ps = ps->next;
    }

    return 0;
}

int main(int argc, char *argv[])
{
    List *ps = primes(100000000);

    printf("%d\n", nth_twin(100000, ps));

    return 0;
}

This times out as well. The meaning of a 2s time limit is that the time taken over all the test cases should be less than 2 seconds. This isnt. — frodo, Apr 13 '12 at 20:36
why prepend to list and then reverse when it's possible to maintain the tail and append to it? why build the list at all if all you need is to sweep the sieve, once - which you do anyway, while *building* it? — Will Ness, Apr 15 '12 at 23:59
the SPOJ FAQ says that `0.0s` entries is a bug. btw Ideone is about 5.5x faster than SPOJ. We could store the precomputed twins in a source code somehow, but it's a minimum of 100,000 bytes, and the size limit on source code is 50K. I wonder how much it'd take, e.g. as Huffman-encoded string, and would this leave enough space to have the decoder in the source too? — Will Ness, Apr 18 '12 at 18:14

score 0 · Answer 6 · answered Apr 17 '12 at 12:36

this is what I have attempted. I have a string of TLEs.

bool mark [N];
vector <int> primeList;

 void sieve ()
 {
memset (mark, true, sizeof (mark));
mark [0] = mark [1] = false;

for ( int i = 4; i < N; i += 2 )
    mark [i] = false;

for ( int i = 3; i * i <= N; i++ )
{
    if ( mark [i] )
    {
        for ( int j = i * i; j < N; j += 2 * i )
            mark [j] = false;
    }
}

primeList.clear ();
primeList.push_back (2);

for ( int i = 3; i < N; i += 2 )
{
    if ( mark [i] )
        primeList.push_back (i);
}

//printf ("%d\n", primeList.size ());
 }

  int main ()
{
sieve ();

vector <int> twinPrime;

for ( size_t i = 1; i < primeList.size (); i++ )
{
    if ( primeList [i] - primeList [i - 1] == 2 )
        twinPrime.push_back (primeList [i - 1]);
}

int t;
scanf("%d",&t);
int s;
while ( t-- )
{
    scanf("%d",&s);
    printf ("%d %d\n", twinPrime [s - 1], twinPrime [s - 1] + 2);
}

return 0;

}

`for ( int i = 3; i * i <= N; i **+= 2** )`. Use `vector mark; mark.resize(N+1,true);`, it's an automatic bit-sieve (1/8th of memory size). Don't bother marking evens, and don't read from them either. Do not build `primesList`, build `twinprimes` directly instead, using `prev_prime` auxiliary variable, in the loop. This should run just under 2 sec there hopefully. If not, use this trick: treat an `i`th entry in your `mark` array as standing not for the number `i`, but `2i+1` instead. Your array will be half in size. This is what I did, and it ran at 1.0 sec on SPOJ. — Will Ness, Apr 17 '12 at 15:59
if you really can't figure out how to make this half-sized array for odds only, [check this out](http://stackoverflow.com/questions/10179837/optimization-of-algorithm/10180394#10180394) for an example. — Will Ness, Apr 17 '12 at 16:09

score 0 · Answer 7 · answered Aug 15 '17 at 14:56

Here is a procedure that could answer your question:

Prime numbers that, when divided by 3, have equal quotients when corrected to decimal 0 (zero) are Twin Primes.

This can be written as

For any pair of prime numbers Px, Py, if [Px/3, 0] = [Py/3, 0] then Px and Py are Prime Twins.

The basis for this is that if prime numbers differ by 2, then dividing the all the prime numbers of interest will yield unique equal quotients when the quotients are corrected to decimal zero. Primes that are not separated by 2 will not have equal quotients when corrected to decimal zero.

For example:

• 11, 13 when divided by 3 will yield unique the unique quotient of 4 when the quotient is corrected to decimal zero.

• 17, 19 when divided by 3 will yield the unique quotient of 6 when the quotient is corrected to decimal zero.

• 29, 31 when divided by 3 will yield the unique quotient of 10 when the quotient is corrected to decimal zero.

Etc.

Below is a simple procedure using Excel to:

• Find prime twins from any list of primes • Find twin primes in any range of primes • Find the largest prime twin prime • Find gaps between twin primes

Import Kutools into Excel
List prime numbers of interest into column 1.
Insert divisor 3 in column 2 - fill down to the level of the largest prime on the list in column 1.
Divide the first row of column 1 by the first row of column 2 and place the quotient in column 3
Fill down column 3 to the level of the largest prime number on the list in column 1.
Correct to zero decimal. Keep the numbers column 3 (quotients) selected.
From “Conditional formatting’ - Select "duplicate values" from the menu
Go to Kutools and select 'to actual' - This will highlight the cells of all the twin pairs scattered in the Quotient column 3.
Select the quotients in column 3
Select 'Sort and Filter' in Excel
Select 'Custom Sort'
Fill in the menu (For values chose the highlighted color in the quotient column) and and click ‘OK”.
The twin primes will be grouped together in the column. This list can then be used to find the gaps between primes.

To find the largest twin prime use the above procedure with a range of the largest known prime into column 1 (e.g. the highest 10k primes).

If a prime twin is not found in this range, then go to the next lowest range until a twin prime is found. This will be the largest twin prime.

Hope this helps.

Finding the Nth Twin Prime

7 Answers7

Linked