7

How to find number of pairs of consecutive prime numbers having difference of 6 like (23,29) from 1 to 2 billion (using any programming language and without using any external libraries) with considering time complexity?

  1. Tried sieve of eratosthenes but getting consecutive primes is challenge

  2. Used generators but time complexity is very high

The code is:

def gen_numbers(n):
    for ele in range(1,n+1):
        for i in range(2,ele//2):
            if ele%i==0:
                break
        else:
            yield ele
    prev=0
    count=0
    for i in gen_numbers(2000000000):
        if i-prev==6:
            count+=1
        prev = i
Martin Evans
  • 45,791
  • 17
  • 81
  • 97
Anand
  • 85
  • 2
  • 7
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/198324/discussion-on-question-by-anand-no-of-pairs-of-consecutive-prime-numbers-having). – Andy Aug 22 '19 at 18:10

6 Answers6

2

Interesting question! I have recently been working on Sieve of Eratosthenes prime generators. @Hans Olsson says

You should use segmented sieve to avoid memory issue: en.wikipedia.org/wiki/Sieve_of_Eratosthenes#Segmented_sieve

I agree, and happen to have one which I hacked to solve this question. Apologies in advance for the length and non Pythonic-ness. Sample output:

$ ./primes_diff6.py 100
7 prime pairs found with a difference of 6.
( 23 , 29 ) ( 31 , 37 ) ( 47 , 53 ) ( 53 , 59 ) ( 61 , 67 ) ( 73 , 79 ) ( 83 , 89 )
25 primes found.
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 
83, 89, 97]
$ ./primes_diff6.py 1e5
1940 prime pairs found with a difference of 6.
9592 primes found.

The code:

#!/usr/bin/python -Wall
# program to find all primes smaller than n, using segmented sieve
# see https://github.com/kimwalisch/primesieve/wiki/Segmented-sieve-of-Eratosthenes

import sys

def segmentedSieve(limit):

    sqrt = int(limit ** 0.5)
    segment_size = sqrt

    prev = 0
    count = 0

    # we sieve primes >= 3
    i = 3
    n = 3

    sieve       = []
    is_prime    = [True] * (sqrt + 1)
    primes      = []
    multiples   = []
    out_primes  = []
    diff6       = []

    for low in xrange(0, limit+1, segment_size):
        sieve = [True] * segment_size

        # current segment = [low, high]
        high = min(low + segment_size -1, limit)

        # add sieving primes needed for the current segment
        #   using a simple sieve of Eratosthenese, starting where we left off
        while i * i <= high:
            if is_prime[i]:
                primes.append(i)
                multiples.append(i * i - low)

                two_i = i + i
                for j in xrange(i * i, sqrt, two_i):
                    is_prime[j] = False
            i += 2

        # sieve the current segment
        for x in xrange(len(primes)):
            k = primes[x] * 2
            j = multiples[x]

            while j < segment_size:     # NB: "for j in range()" doesn't work here.
                sieve[j] = False
                j += k

            multiples[x] = j - segment_size

        # collect results from this segment
        while n <= high:
            if sieve[n - low]:
                out_primes.append(n)
                if n - 6 == prev:
                    count += 1
                    diff6.append(n)
                prev = n
            n += 2

    print count, "prime pairs found with a difference of 6."
    if limit < 1000:
        for x in diff6:
            print "(", x-6, ",", x, ")",
        print
    return out_primes

# Driver Code
if len(sys.argv) < 2:
    n = 500
else:
    n = int(float(sys.argv[1]))

primes = [2] + segmentedSieve(n)

print len(primes), "primes found."
if n < 1000:
    print primes

This might work as-is if you run it for size 2e9 (2 billion) and subtract the result of size 1e9 (1 billion).

EDIT

Performance info, requested by @ValentinB.

$ time ./primes_diff6.py 2e9 
11407651 prime pairs found with a difference of 6. 
98222287 primes found. 

real 3m1.089s 
user 2m56.328s 
sys  0m4.656s

... on my newish laptop, 1.6 GHz i5-8265U, 8G RAM, Ubuntu on WSL, Win10

I found a mod 30 prime wheel here in a comment by Willy Good that is about 3x faster than this code at 1e9, about 2.2x faster at 2e9. Not segmented, the guts is a Python generator. I'm wondering if it could be segmented or changed to use a bit array to help its memory footprint without otherwise destroying its performance.

END EDIT

Greg Ames
  • 146
  • 4
  • There is no chance of the off-by-one error that you mentioned, as the next prime larger than a billion is 1000000007. – user448810 Aug 22 '19 at 13:26
  • @user448810 You are right. I verified and edited my answer. – Greg Ames Aug 22 '19 at 14:10
  • There are 14 prime pairs less than 100 with a difference of 6: (5,11), (7,13), (11,17), (13,19), (17,23), (23,29), (31,37), (37,43), (41,47), (47,53), (61,67), (67,73), (73,79), (83,89). – user448810 Aug 22 '19 at 14:23
  • Have you tried running it for a large limit ? I'm curious to see how it performs. My code yields the same values as yours for n=100 and n=100000, but takes very long to run for n=2000000000. I upvoted your answer because it is the only one showing the actual implementation of a working sieve. – Valentin B. Aug 22 '19 at 14:39
  • @ValentinB. Thanks much! I am no longer a stackoverflow virgin ;-) Yes I have run it for large N. But I didn't want to give away the answer. However, you have to run it for 10**9 also to get the complete answer. So here it is: $ time ./primes_diff6.py 2e9 11407651 prime pairs found with a difference of 6. 98222287 primes found. real 3m1.089s user 2m56.328s sys 0m4.656s – Greg Ames Aug 22 '19 at 17:10
  • @user448810 The OP asked for pairs of _consecutive_ primes with a diff of 6, making the problem a little easier. – Greg Ames Aug 22 '19 at 17:24
  • @user448810 I've added an answer along the lines of what I understood as your intention. Thanks for the lesson! – גלעד ברקן Aug 23 '19 at 04:28
1

This will require storing all primes from 0 to sqrt(2000000000) so memory wise it's not optimal but maybe this will do for you ? You're going to have to go for a more complex sieve if you want to be more efficient though.

n = 2000000000
last_prime = 3
prime_numbers_to_test = [2, 3]
result = 0

for i in range(5, n, 2):
    for prime in prime_numbers_to_test:
        # Not prime -> next
        if i % prime == 0:
            break

    else:
        # Prime, test our condition
        if i - last_prime == 6:
            result += 1

        last_prime = i

        if i**2 < n:
            prime_numbers_to_test.append(i)

print(result)

EDIT This code yielded a result of 11,407,651 pairs of consecutive primes with a difference of 6 for n = 2,000,000,000

Valentin B.
  • 602
  • 6
  • 18
1

Here's a demonstration along the lines of what I understood as user448810's intention in their comments. We use the primes to mark off, meaning sieve, only relevant numbers in the range. Those are numbers of the form 6k + 1 and 6k - 1.

Python 2.7 code:

# https://rosettacode.org/wiki/Modular_inverse
def extended_gcd(aa, bb):
    lastremainder, remainder = abs(aa), abs(bb)
    x, lastx, y, lasty = 0, 1, 1, 0
    while remainder:
        lastremainder, (quotient, remainder) = remainder, divmod(lastremainder, remainder)
        x, lastx = lastx - quotient*x, x
        y, lasty = lasty - quotient*y, y
    return lastremainder, lastx * (-1 if aa < 0 else 1), lasty * (-1 if bb < 0 else 1)

def modinv(a, m):
    g, x, y = extended_gcd(a, m)
    if g != 1:
        raise ValueError
    return x % m


from math import sqrt, ceil

n = 2000000000
sqrt_n = int(sqrt(n))

A = [True] * (sqrt_n + 1)

for i in xrange(2, sqrt_n // 2):
  A[2*i] = False

primes = [2]

for i in xrange(3, sqrt_n, 2):
  if A[i]:
    primes.append(i)
    c = i * i
    while c <= sqrt_n:
      A[c] = False
      c = c + i 

print "Primes with a Difference of 6"
print "\n%s initial primes to work from." % len(primes)

lower_bound = 1000000000
upper_bound = 1000001000
range = upper_bound - lower_bound

print "Range: %s to %s" % (lower_bound, upper_bound)

# Primes of the form 6k + 1

A = [True] * (range // 6 + 1)

least_6k_plus_1 = int(ceil((lower_bound - 1) / float(6)))
most_6k_plus_1 = (upper_bound - 1) // 6

for p in primes[2:]:
  least = modinv(-6, p)
  least_pth_over = int(least + p * ceil((least_6k_plus_1 - least) / float(p)))
  c = int(least_pth_over - least_6k_plus_1)
  while c < len(A):
    A[c] = False
    c = c + p

print "\nPrimes of the form 6k + 1:"

for i in xrange(1, len(A)):
  if A[i] and A[i - 1]:
    p1 = (i - 1 + least_6k_plus_1) * 6 + 1
    p2 = (i + least_6k_plus_1) * 6 + 1
    print p1, p2


# Primes of the form 6k - 1

A = [True] * (range // 6 + 1)

least_6k_minus_1 = int(ceil((lower_bound + 1) / float(6)))
most_6k_minus_1 = (upper_bound + 1) // 6

for p in primes[2:]:
  least = modinv(6, p)
  least_pth_over = int(least + p * ceil((least_6k_minus_1 - least) / float(p)))
  c = int(least_pth_over - least_6k_minus_1)
  while c < len(A):
    A[c] = False
    c = c + p

print "\nPrimes of the form 6k - 1:"

for i in xrange(1, len(A)):
  if A[i] and A[i - 1]:
    p1 = (i - 1 + least_6k_minus_1) * 6 - 1
    p2 = (i + least_6k_minus_1) * 6 - 1
    print p1, p2

Output:

Primes with a Difference of 6

4648 initial primes to work from.
Range: 1000000000 to 1000001000

Primes of the form 6k + 1:
1000000087 1000000093
1000000447 1000000453
1000000453 1000000459

Primes of the form 6k - 1:
1000000097 1000000103
1000000403 1000000409
1000000427 1000000433
1000000433 1000000439
1000000607 1000000613

In order to count consecutive primes, we have to take into account the interleaving lists of primes 6k + 1 and 6k - 1. Here's the count:

# https://rosettacode.org/wiki/Modular_inverse
def extended_gcd(aa, bb):
    lastremainder, remainder = abs(aa), abs(bb)
    x, lastx, y, lasty = 0, 1, 1, 0
    while remainder:
        lastremainder, (quotient, remainder) = remainder, divmod(lastremainder, remainder)
        x, lastx = lastx - quotient*x, x
        y, lasty = lasty - quotient*y, y
    return lastremainder, lastx * (-1 if aa < 0 else 1), lasty * (-1 if bb < 0 else 1)

def modinv(a, m):
    g, x, y = extended_gcd(a, m)
    if g != 1:
        raise ValueError
    return x % m


from math import sqrt, ceil
import time

start = time.time()

n = 2000000000
sqrt_n = int(sqrt(n))

A = [True] * (sqrt_n + 1)

for i in xrange(2, sqrt_n // 2):
  A[2*i] = False

primes = [2]

for i in xrange(3, sqrt_n, 2):
  if A[i]:
    primes.append(i)
    c = i * i
    while c <= sqrt_n:
      A[c] = False
      c = c + i 

lower_bound = 1000000000
upper_bound = 2000000000
range = upper_bound - lower_bound

A = [True] * (range // 6 + 1)

least_6k_plus_1 = int(ceil((lower_bound - 1) / float(6)))
most_6k_plus_1 = (upper_bound - 1) // 6

for p in primes[2:]:
  least = modinv(-6, p)
  least_pth_over = int(least + p * ceil((least_6k_plus_1 - least) / float(p)))
  c = int(least_pth_over - least_6k_plus_1)
  while c < len(A):
    A[c] = False
    c = c + p

B = [True] * (range // 6 + 1)

least_6k_minus_1 = int(ceil((lower_bound + 1) / float(6)))
most_6k_minus_1 = (upper_bound + 1) // 6

for p in primes[2:]:
  least = modinv(6, p)
  least_pth_over = int(least + p * ceil((least_6k_minus_1 - least) / float(p)))
  c = int(least_pth_over - least_6k_minus_1)
  while c < len(B):
    B[c] = False
    c = c + p

total = 0

for i in xrange(1, max(len(A), len(B))):
  if A[i] and A[i - 1] and not B[i]:
    total = total + 1
  if B[i] and B[i - 1] and not A[i - 1]:
    total = total + 1

# 47374753 primes in range 1,000,000,000 to 2,000,000,000
print "%s consecutive primes with a difference of 6 in range %s to %s." % (total, lower_bound, upper_bound)
print "--%s seconds" % (time.time() - start)

Output:

5317860 consecutive primes with a difference of 6 in range 1000000000 to 2000000000.
--193.314619064 seconds
גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
1

Python isn't the best language to write this in, but since that's what we're all doing...

This little segmented sieve finds the answer 5317860 in 3:24

import math

# Find primes < 2000000000

sieve = [True]*(int(math.sqrt(2000000000))+1)

for i in range(3,len(sieve)):
    if (sieve[i]):
        for j in range(2*i, len(sieve), i):
            sieve[j] = False

smallOddPrimes = [i for i in range(3,len(sieve),2) if sieve[i]]

# Check primes in target segments

total=0
lastPrime=0

for base in range(1000000000, 2000000000, 10000000):
    sieve = [True]*5000000
    for p in smallOddPrimes:
        st=p-(base%p)
        if st%2==0: #first odd multiple of p
            st+=p
        for i in range(st//2,len(sieve),p):
            sieve[i]=False
    for prime in [i*2+base+1 for i in range(0,len(sieve)) if sieve[i]]:
        if prime == lastPrime+6:
            total+=1
        lastPrime = prime

print(total)
Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
1

There are a number of ways to compute the sexy primes between one billion and two billion. Here are four.

Our first solution identifies sexy primes p by using a primality test to check both p and p + 6 for primality:

def isSexy(n):
    return isPrime(n) and isPrime(n+6)

Then we check each odd number from one billion to two billion:

counter = 0
for n in xrange(1000000001, 2000000000, 2):
    if isSexy(n): counter += 1

print counter

That takes an estimated two hours on my machine, determined by running it from 1 billion to 1.1 billion and multiplying by 10. We need something better. Here's the complete code:

Python 2.7.9 (default, Jun 21 2019, 00:38:53) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def isqrt(n): # newton
...   x = n; y = (x + 1) // 2
...   while y < x:
...     x = y
...     y = (x + n // x) // 2
...   return x
... 
>>> def isSquare(n):
...   # def q(n):
...   # from sets import Set
...   # s, sum = Set(), 0
...   # for x in xrange(0,n):
...   #   t = pow(x,2,n)
...   #   if t not in s:
...   #     s.add(t)
...   #     sum += pow(2,t)
...   # return sum
...   # q(32) => 33751571
...   # q(27) => 38348435
...   # q(25) => 19483219
...   # q(19) =>   199411
...   # q(17) =>   107287
...   # q(13) =>     5659
...   # q(11) =>      571
...   # q(7)  =>       23
...   # 99.82% of non-squares
...   # caught by filters before
...   # square root calculation
...   if 33751571>>(n%32)&1==0:
...     return False
...   if 38348435>>(n%27)&1==0:
...     return False
...   if 19483219>>(n%25)&1==0:
...     return False
...   if   199411>>(n%19)&1==0:
...     return False
...   if   107287>>(n%17)&1==0:
...     return False
...   if     5659>>(n%13)&1==0:
...     return False
...   if      571>>(n%11)&1==0:
...     return False
...   if       23>>(n% 7)&1==0:
...     return False
...   s = isqrt(n)
...   if s * s == n: return s
...   return False
... 
>>> def primes(n): # sieve of eratosthenes
...     i, p, ps, m = 0, 3, [2], n // 2
...     sieve = [True] * m
...     while p <= n:
...         if sieve[i]:
...             ps.append(p)
...             for j in range((p*p-3)/2, m, p):
...                 sieve[j] = False
...         i, p = i+1, p+2
...     return ps
... 
>>> pLimit,pList = 0,[]
>>> pLen,pMax = 0,0
>>> 
>>> def storePrimes(n):
...   # call with n=0 to clear
...   global pLimit, pList
...   global pLen, pMax
...   if n == 0:
...     pLimit,pList = 0,[]
...     pLen,pMax = 0,0
...   elif pLimit < n:
...     pLimit = n
...     pList = primes(n)
...     # x=primesRange(pLimit,n)
...     # pList += x
...     pLen = len(pList)
...     pMax = pList[-1]
... 
>>> storePrimes(1000)
>>> def gcd(a, b): # euclid
...   if b == 0: return a
...   return gcd(b, a%b)
... 
>>> def kronecker(a,b):
...   # for any integers a and b
...   # cohen 1.4.10
...   if b == 0:
...     if abs(a) == 1: return 1
...     else: return 0
...   if a%2==0 and b%2==0:
...     return 0
...   tab2=[0,1,0,-1,0,-1,0,1]
...   v = 0
...   while b%2==0: v,b=v+1,b/2
...   if v%2==0: k = 1
...   else: k = tab2[a%8]
...   if b < 0:
...     b = -b
...     if a < 0: k = -k
...   while True:
...     if a == 0:
...       if b > 1: return 0
...       else: return k
...     v = 0
...     while a%2==0: v,a=v+1,a/2
...     if v%2==1: k*=tab2[b%8]
...     if (a&b&2): k = -k
...     r=abs(a); a=b%r; b=r
... 
>>> def jacobi(a,b):
...   # for integers a and odd b
...   if b%2 == 0:
...     m="modulus must be odd"
...     raise ValueError(m)
...   return kronecker(a,b)
... 
>>> def isSpsp(n,a,r=-1,s=-1):
...   # strong pseudoprime
...   if r < 0:
...     r, s = 0, n - 1
...     while s % 2 == 0:
...       r, s = r + 1, s / 2
...   if pow(a,s,n) == 1:
...     return True
...   for i in range(0,r):
...     if pow(a,s,n) == n-1:
...       return True
...     s = s * 2
...   return False
... 
>>> def lucasPQ(p, q, m, n):
...   # nth element of lucas
...   # sequence with parameters
...   # p and q (mod m); ignore
...   # modulus operation when
...   # m is zero
...   def mod(x):
...     if m == 0: return x
...     return x % m
...   def half(x):
...     if x%2 == 1: x=x+m
...     return mod(x / 2)
...   un, vn, qn = 1, p, q
...   u=0 if n%2==0 else 1
...   v=2 if n%2==0 else p
...   k=1 if n%2==0 else q
...   n,d = n//2, p*p-4*q
...   while n > 0:
...     u2 = mod(un*vn)
...     v2 = mod(vn*vn-2*qn)
...     q2 = mod(qn*qn)
...     n2 = n // 2
...     if n % 2 == 1:
...       uu = half(u*v2+u2*v)
...       vv = half(v*v2+d*u*u2)
...       u,v,k = uu,vv,k*q2
...     un,vn,qn,n = u2,v2,q2,n2
...   return u, v, k
... 
>>> def isSlpsp(n):
...   # strong lucas pseudoprime
...   def selfridge(n):
...     d,s = 5,1
...     while True:
...       ds = d * s
...       if gcd(ds, n) > 1:
...         return ds, 0, 0
...       if jacobi(ds,n) == -1:
...         return ds,1,(1-ds)/4
...       d,s = d+2, -s
...   d, p, q = selfridge(n)
...   if p == 0: return n == d
...   s, t = 0, n + 1
...   while t % 2 == 0:
...     s, t = s + 1, t / 2
...   u,v,k = lucasPQ(p,q,n,t)
...   if u == 0 or v == 0:
...     return True
...   for r in range(1, s):
...     v = (v*v-2*k) % n
...     if v == 0: return True
...     k = (k * k) % n
...   return False
... 
>>> def isPrime(n):
...   # mathematica method
...   if n < 2: return False
...   for p in pList[:25]:
...     if n%p == 0: return n==p
...   if isSquare(n):
...     return False
...   r, s = 0, n - 1
...   while s % 2 == 0:
...     r, s = r + 1, s / 2
...   if not isSpsp(n,2,r,s):
...     return False
...   if not isSpsp(n,3,r,s):
...     return False
...   if not isSlpsp(n):
...     return False
...   return True
...
>>> def isSexy(n):
...   return isPrime(n) and isPrime(n+6)
... 
>>> counter = 0
>>> for n in xrange(1000000001, 2000000000, 2):
...   if isSexy(n): counter += 1
... 
>>> print counter
5924680

By the way, if you want to see how slow Python is for things like this, here is the equivalent program in Pari/GP, a programming environment designed for number-theoretic calculations, which finishes in 70253 milliseconds, just over a minute:

gettime(); c = 0;
forprime(p = 1000000000, 2000000000, if (isprime(p+6), c++));
print (c); print (gettime());
5924680
70253

Our second solution uses our standard prime generator to generate the primes from one billion to two billion, checking for each prime p if p - 6 is on the list:

counter = 0
ps = primeGen(1000000000)
p2 = next(ps); p1 = next(ps); p = next(ps)
while p < 2000000000:
    if p - p2 == 6 or p - p1 == 6:
        counter += 1
    p2 = p1; p1 = p; p = next(ps)

print counter

That took about 8.5 minutes and produced the correct result. Here's the complete code:

Python 2.7.9 (default, Jun 21 2019, 00:38:53) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def primeGen(start=0):
...   if start <= 2: yield 2
...   if start <= 3: yield 3
...   ps = primeGen()
...   p=next(ps); p=next(ps)
...   q = p*p; D = {}
...   def add(m,s):
...     while m in D: m += s
...     D[m] = s
...   while q <= start:
...     x = (start // p) * p
...     if x < start: x += p
...     if x%2 == 0: x += p
...     add(x, p+p)
...     p=next(ps); q=p*p
...   c = max(start-2, 3)
...   if c%2 == 0: c += 1
...   while True:
...     c += 2
...     if c in D:
...       s = D.pop(c)
...       add(c+s, s)
...     elif c < q: yield c
...     else: # c == q
...       add(c+p+p, p+p)
...       p=next(ps); q=p*p
... 
>>> counter = 0
>>> ps = primeGen(1000000000)
>>> p2 = next(ps); p1 = next(ps); p = next(ps)
>>> while p < 2000000000:
...     if p - p2 == 6 or p - p1 == 6:
...         counter += 1
...     p2 = p1; p1 = p; p = next(ps)
... 
p>>> print counter
5924680

If you will permit me another digression on Pari/GP, here is the equivalent program using a prime generator, computing the solution in just 37 seconds:

gettime(); p2 = nextprime(1000000000); p1=nextprime(p2+1); c = 0;
forprime(p = nextprime(p1+1), 2000000000,
    if (p-p2==6 || p-p1==6, c++); p2=p1; p1=p);
print (c); print (gettime());
5924680
37273

Our third solution uses a segmented sieve to make a list of primes from one billion to two billion, then scans the list counting the sexy primes:

counter = 0
ps = primes(1000000000, 2000000000)
if ps[1] - ps[0] == 6: counter += 1

for i in xrange(2,len(ps)):
    if ps[i] - ps[i-2] == 6 or ps[i] - ps[i-1] == 6:
        counter += 1

print counter

That takes about four minutes to run, and produces the correct result. Here's the complete code:

Python 2.7.9 (default, Jun 21 2019, 00:38:53) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def isqrt(n): # newton
...   x = n; y = (x + 1) // 2
...   while y < x:
...     x = y
...     y = (x + n // x) // 2
...   return x
... 
>>> def primes(lo, hi=False):
...   if not hi: lo,hi = 0,lo
...   
...   if hi-lo <= 50:
...     xs = range(lo,hi)
...     return filter(isPrime,xs)
...   
...   # sieve of eratosthenes
...   if lo<=2 and hi<=1000000:
...     i,p,ps,m = 0,3,[2],hi//2
...     sieve = [True] * m
...     while p <= hi:
...      if sieve[i]:
...       ps.append(p)
...       s = (p*p-3)/2
...       for j in xrange(s,m,p):
...        sieve[j] = False
...      i,p = i+1, p+2
...     return ps
...   
...   if lo < isqrt(hi):
...     r = isqrt(hi) + 1
...     loPs=primes(lo,r)
...     hiPs=primes(r+1,hi)
...     return loPs + hiPs
...   
...   # segmented sieve
...   if lo%2==1: lo-=1
...   if hi%2==1: hi+=1
...   r = isqrt(hi) + 1
...   b = r//2; bs = [True] * b
...   ps = primes(r)[1:]
...   qs = [0] * len(ps); zs = []
...   for i in xrange(len(ps)):
...     q = (lo+1+ps[i]) / -2
...     qs[i]= q % ps[i]
...   for t in xrange(lo,hi,b+b):
...     if hi<(t+b+b): b=(hi-t)/2
...     for j in xrange(b):
...       bs[j] = True
...     for k in xrange(len(ps)):
...       q,p = qs[k], ps[k]
...       for j in xrange(q,b,p):
...         bs[j] = False
...       qs[k] = (qs[k]-b)%ps[k]
...     for j in xrange(b):
...       if bs[j]:
...         zs.append(t+j+j+1)
...   return zs
... 
>>> counter = 0
>>> ps = primes(1000000000, 2000000000)
>>> for i in xrange(2, len(ps)):
...     if ps[i] - ps[i-2] == 6 or ps[i] - ps[i-1] == 6:
...         counter += 1
... 
>>> print counter
5924680

Here is our fourth and final program to count the sexy primes, suggested in the comments above. The idea is to sieve each of the polynomials 6 n + 1 and 6 n − 1 separately, scan for adjacent pairs, and combine the counts.

This is a little bit tricky, so let's look at an example: sieve 6 n + 1 on the range 100 to 200 using the primes 5, 7, 11 and 13, which are the sieving primes less than 200 (excluding 2 and 3, which divide 6). The sieve has 17 elements 103, 109, 115, 121, 127, 133, 139, 145, 151, 157, 163, 169, 175, 181, 187, 193, 199. The least multiple of 5 in the list is 115, so we strike 115, 145 and 175 (every 5th item) from the sieve. The least multiple of 7 in the list is 133, so we strike 133 and 175 (every 7th item) from the sieve. The least multiple of 11 in the list is 121, so we strike 121 and 187 (every 11th item) from the list. And the least multiple of 13 in the list is 169, so we strike it from the list (it's in the middle of a 17-item list, and has no other multiples in the list). The primes that remain in the list are 103, 109, 127, 139, 151, 157, 163, 181, 193, and 199; of those, 103, 151, 157 and 193 are sexy.

The trick is finding the offset in the sieve of the first multiple of the prime. The formula is (lo + p) / -6 (mod p), where lo is the first element in the sieve (103 in the example above) and p is the prime; -6 comes from the gap between successive elements of the sieve. In modular arithmetic, division is undefined, so we can't just divide by -6; instead, we find the modular inverse. And since modular inverse is undefined for negative numbers, we first convert -6 to its equivalent mod p. Thus, for our four sieving primes, the offsets into the sieve are:

((103 +  5) * inverse(-6 %  5,  5)) %  5 =  2  ==> points to 115
((103 +  7) * inverse(-6 %  7,  7)) %  7 =  5  ==> points to 133
((103 + 11) * inverse(-6 % 11, 11)) % 11 =  3  ==> points to 121
((103 + 13) * inverse(-6 % 13, 13)) % 13 = 11  ==> points to 169

Sieving sieve 6 n - 1 works the same way, except that lo is 101 instead of 103; the sieve contains 101, 107, 113, 119, 125, 131, 137, 143, 149, 155, 161, 167, 173, 179, 185, 191, 197:

((101 +  5) * inverse(-6 %  5,  5)) %  5 =  4  ==> points to 125
((101 +  7) * inverse(-6 %  7,  7)) %  7 =  3  ==> points to 119
((101 + 11) * inverse(-6 % 11, 11)) % 11 =  7  ==> points to 143
((101 + 13) * inverse(-6 % 13, 13)) % 13 =  7  ==> points to 143

After sieving, the numbers that remain in the sieve are 101, 107, 113, 131, 137, 149, 167, 173, 179, 191, 197, of which 101, 107, 131, 167, 173 and 191 are sexy, so there are 10 sexy primes between 100 and 200.

Here's the code:

Python 2.7.9 (default, Jun 21 2019, 00:38:53) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def isqrt(n): # newton
...   x = n; y = (x + 1) // 2
...   while y < x:
...     x = y
...     y = (x + n // x) // 2
...   return x
... 
>>> def inverse(x, m): # euclid
...   a, b, u = 0, m, 1
...   while x > 0:
...     x,a,b,u=b%x,u,x,a-b//x*u
...   if b == 1: return a % m
...   return 0 # must be coprime
... 
>>> def primes(n): # sieve of eratosthenes
...     i, p, ps, m = 0, 3, [2], n // 2
...     sieve = [True] * m
...     while p <= n:
...         if sieve[i]:
...             ps.append(p)
...             for j in range((p*p-3)/2, m, p):
...                 sieve[j] = False
...         i, p = i+1, p+2
...     return ps
... 
>>> counter = 0
>>> ps = primes(isqrt(2000000000))[2:]
>>> size = (2000000000-1000000000)/6+1
>>> 
>>> # sieve on 6n-1
... lo = 1000000000/6*6+5
>>> sieve = [True] * size
>>> for p in ps:
...     q = ((lo+p)*inverse(-6%p,p))%p
...     for i in xrange(q,size,p):
...         sieve[i] = False
... 
>>> for i in xrange(1,size):
...     if sieve[i] and sieve[i-1]:
...         counter += 1
... 
>>> # sieve on 6n+1
... lo += 2
>>> sieve = [True] * size
>>> for i in xrange(0,size):
...     sieve[i] = True
... 
>>> for p in ps:
...     q = ((lo+p)*inverse(-6%p,p))%p
...     for i in xrange(q,size,p):
...         sieve[i] = False
... 
>>> for i in xrange(1,size):
...     if sieve[i] and sieve[i-1]:
...         counter += 1
... 
>>> print counter
5924680

That took about three minutes to run and produced the correct result.

If you are determined to count only consecutive primes that differ by 6, instead of counting all the sexy primes, the easiest way is to use the segmented sieve of the third method and change the predicate in the counting test to look only at ps[i] - ps[i-1] == 6. Or you can do this in just 22 seconds in Pari/GP:

gettime(); prev = nextprime(1000000000); c = 0;
forprime(p = nextprime(prev+1), 2000000000, if (p-prev==6, c++); prev=p);
print (c); print (gettime());
5317860
22212
user448810
  • 17,381
  • 4
  • 34
  • 59
0

First of all, I build a sieve; you need check primes only through sqrt(limit). This takes less that 7 minutes on my aging desktop (Intel Haswell ... yes, that out of date).

With this, finding the pairs is trivial. Check each odd number and its desired partner. I've also printed the time used at 1000-pair intervals.

NOTE: if the problem is, indeed, to count only consecutive primes, then remove the check against prime_list[idx+2].

from time import time

start = time()
limit = 10**9 * 2

prime_list = build_sieve(limit)
print("sieve built in", round(time()-start, 2), "sec")

count = 0
for idx, p in enumerate(prime_list[:-2]):
    if prime_list[idx+1] == p+6 or \
       prime_list[idx+2] == p+6:
        count += 1

print(count, "pairs found in", round(time()-start, 2), "sec")

Output:

sieve built in 417.01 sec
12773727 pairs found in 481.23 sec

That's about 7 minutes to build the sieve, another minute to count the pairs. This is with base Python. If you use NumPy, you can shift the sieve by one and two positions, do the vectorized subtractions, and count how many times 6 appears in the results.

Prune
  • 76,765
  • 14
  • 60
  • 81
  • Where does your `build_sieve` function come from ? Also my solution yielded 11407651, I don't know which is right though. – Valentin B. Aug 22 '19 at 13:59
  • @ValentinB. I confirm 12773727 pairs. I suspect your solution only looked at the previous prime; sometimes you have to look at the _second_ previous prime to find a difference of 6. – user448810 Aug 22 '19 at 14:14
  • @user448810 one might argue that to be the wrong choice *here*, as the question asks for "*consecutive* prime numbers having [the] difference of 6". :) – Will Ness Aug 22 '19 at 14:25
  • @WillNess Indeed I tried to answer the question as it was worded by OP. So I guess my solution (although very long to run) does solve for the problem if the wording corresponds to what OP wants. – Valentin B. Aug 22 '19 at 14:32
  • @ValentinB. THe sieve comes from whatever sieve method you choose to use. Mine is the one I build for solving Project Euler problems. A sieve is a trivial search, i.e. out of scope for Stack Overflow. – Prune Aug 22 '19 at 14:40
  • @ValentinB. You're right: the OP specifies *consecutive* primes, which would disqualify 11-17 and many other pairs. My solution above can be redeemed by removing the check against `prime_list[idx+2]`. – Prune Aug 22 '19 at 14:41
  • @Prune You are right by saying the sieve implementation is a very common problem, but it is common to provide a hyperlink to some ressource that could at least help OP, because as is your answer does not fully answer the question. – Valentin B. Aug 22 '19 at 14:45
  • 2
    @Prune "A sieve is a trivial search, i.e. out of scope for Stack Overflow"? really? :) there are [1,733 questions on SO](https://stackoverflow.com/search?q=sieve+is%3Aq) containing the word "sieve". it even got its own tag on SO, more than one, even. :) – Will Ness Aug 22 '19 at 14:53
  • 1
    @WIllNess: Hi Will, Phil here, haven't seen you here for a while. I assumed the original poster intended the standard mathematician's definition of sexy primes ([WikiPedia](https://en.wikipedia.org/wiki/Sexy_prime), [MathWorld](http://mathworld.wolfram.com/SexyPrimes.html), [OEIS](https://oeis.org/A023201)) but stated it incorrectly. It would be unusual to restrict the task to consecutive primes. – user448810 Aug 22 '19 at 14:54
  • @WillNess: exactly my point: searching out a sieve solution for OP is out of scope -- we expect OP to do that research before posting. – Prune Aug 22 '19 at 14:56
  • @user448810 nice seeing you here too, Phil ! :) :) yeah, that's always a possibility... ( the question is complex BTW, mentions not using any external libs, mentions time complexity... ) thanks for the links! – Will Ness Aug 22 '19 at 14:59
  • 1
    @Prune actually, they mention not using any external libraries, in the question. Looks like that means Sieve code must be included in the answer! :) – Will Ness Aug 22 '19 at 15:01
  • @WillNess That's a *non sequitur*. A researched, hand-coded sieve avoids using an external library. OP is welcome to insert any likely-looking sieve. As that issue is covered *quite* well in other places, I do not include it here, per SO guidelines. – Prune Aug 23 '19 at 14:59
  • Ok. I think I addressed the interleaving lists of primes `6k+1` and `6k-1`. My program counted 5317860 *consecutive* primes with a difference of 6 in range 1000000000 to 2000000000 in 3 minutes and 13 seconds. If you have a change, can you please confirm this result? (cc @WillNess) – גלעד ברקן Aug 23 '19 at 16:10
  • @גלעדברקן same number here. :) takes 2.2 minutes in Haskell interpreter (GHCi) with the compiled library code for the primes list from package "arithmoi" by Daniel Fischer (originally), filtered and counted by the million (to see the progression). – Will Ness Aug 23 '19 at 16:28
  • @גלעדברקן in case you're interested, the code is `P.primes & dropWhile (< 1000000000) & takeWhile (< 2000000000) & tails & concatMap (\case (x:y:_) -> [() | y-x==6] ; _->[]) & chunksOf 1000000 & map length & scanl (+) 0` with `chunksOf n = takeWhile (not.null) . unfoldr (Just . splitAt n)`. actually, it took 86.5 seconds on the first run, 65 seconds on subsequent runs (on my 4.5 years old laptop). – Will Ness Aug 23 '19 at 16:43
  • @WillNess nice, and thanks for confirming. I imagine arithmoi is about as optimised as it gets. But what happened to "no external libraries? ;) – גלעד ברקן Aug 23 '19 at 16:54
  • @גלעדברקן hence no posted answer. :) – Will Ness Aug 24 '19 at 08:21