finding the lowest collatz sequence that gives more that 65 primes

Question

I have a task where I need to find the lowest Collatz sequence that contains more than 65 prime numbers in Python.

For example, the Collatz sequence for 19 is:

19, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1

This sequence contains 7 prime numbers.

I also need to use memoization so it doesn't have to run a "year" to find it. I found code for memoization of Collatz sequences, but I can't figure out how to get it to work when I need only the prime numbers.

Here is the Collatz memoization code that I found:

lookup = {}
def countTerms(n):
    if n not in lookup:
        if n == 1:
            lookup[n] = 1
        elif not n % 2:
            lookup[n] = countTerms(n / 2)[0] + 1
        else:
            lookup[n] = countTerms(n*3 + 1)[0] + 1

    return lookup[n], n

And here is my tester for prime:

def is_prime(a):
    for i in xrange(2,a):
        if a%i==0:
            #print a, " is not a prime number"
            return False
    if a==1:
        return False
    else:
        return True

well i dont have a problem to check if a number is a prime but i cant figure out how to get the correct number out of the memorization list for every number im testing on. if that made any sense :P — spørreren, Apr 28 '15 at 13:34
The question is a bit unclear, can you please clearly define your question as what do you want to achieve or what are you trying to do ? — ZdaR, Apr 28 '15 at 14:02
I am trying to find all the prime numbers under a specific collatz sequence, using memoization. For example, the collatz sequence for 19 exists of 7 prime numbers. I want to find the colllatz sequence for the lowest possible number that exsists of more than 65 prime numbers. To do this i need to use memoization, or else it would take for ever to calculate. — spørreren, Apr 28 '15 at 14:09
@PadraicCunningham: A sieve isn't suitable here: the numbers that require primality testing get rather large. Eg, by the time we get to a Collatz sequence containing 50 primes we need to test if 121012864 is prime. By 61, we're up to 56991483520. OTOH, a slightly better prime tester won't hurt. :) — PM 2Ring, Apr 28 '15 at 14:26
$ python -i collatz.py >>> solve(7) 19 -- Something like this but for n=65? — James Mills, Apr 28 '15 at 14:27

PM 2Ring · Accepted Answer · 2015-04-29T07:33:06.347

Your existing code is incorrectly indented. I assume this task is a homework task, so I won't post a complete working solution, but I'll give you some helpful snippets.

First, here's a slightly more efficient primality tester. Rather than testing if all numbers less than a are factors of a, it just tests up to the square root of a.

def is_prime(a):
    for i in xrange(2, int(1 + a ** 0.5)):
        if a % i == 0:
            return False
    return True

Note that this function returns True for a = 1. That's ok, since you don't need to test 1: you can pre-load it into the lookup dict:

lookup = {1:0}

Your countTerms function needs to be modified slightly so that it only adds one to the lookup count when the current n is prime. In Python, False has a numeric value of 0 and True has a numeric value of 1. That's very handy here:

def count_prime_terms(n):
    ''' Count the number of primes terms in a Collatz sequence '''
    if n not in lookup:
        if n % 2:
            next_n = n * 3 + 1
        else:
            next_n = n // 2

        lookup[n] = count_prime_terms(next_n) + is_prime(n)
    return lookup[n]

I've changed the function name to be more Pythonic.

FWIW, the first Collatz sequence containing 65 or more primes actually contains 67 primes. Its seed number is over 1.8 million, and the highest number that requires primality testing when checking all sequences up to that seed is 151629574372. At completion, the lookup dict contains 3920492 entries.

In response to James Mills's comments regarding recursion, I've written a non-recursive version, and to make it easy to see that the iterative and the recursive versions both produce the same results I'm posting a complete working program. I said above that I wasn't going to do that, but I figure that it should be ok to do so now, since spørreren has already written their program using the info I supplied in my original answer.

I fully agree that it's good to avoid recursion except in situations where it's appropriate to the problem domain (eg, tree traversal). Python discourages recursion - it cannot optimize tail-call recursion and it imposes a recursion depth limit (although that limit can be modified, if desired).

This Collatz sequence prime counting algorithm is naturally stated recursively, but it's not too hard to do iteratively - we just need a list to temporarily hold the sequence while the primality of all its members are being determined. True, this list takes up RAM, but it's (probably) much more efficient space-wise than the stack frame requirements that the recursive version needs.

The recursive version reaches a recursion depth of 343 when solving the problem in the OP. This is well within the default limit but it's still not good, and if you want to search for sequences containing much larger numbers of primes, you will hit that limit.

The iterative & recursive versions run at roughly the same speed (at least, they do so on my machine). To solve the problem stated in the OP they both take a little under 2 minutes. This is significantly faster than my original solution, mostly due to optimizations in primality testing.

The basic Collatz sequence generation step already needs to determine if a number is odd or even. Clearly, if we already know that a number is even then there's no need to test if it's a prime. :) And we can also eliminate tests for even factors in the is_prime function. We can handle the fact that 2 is prime by simply loading the result for 2 into the lookup cache.

On a related note, when searching for the first sequence containing a given number of primes we don't need to test any of the sequences that start at an even number. Even numbers (apart from 2) don't increase the prime count of a sequence, and since the first odd number in such a sequence will be lower than our current number its results will already be in the lookup cache, assuming we start our search from 3. And if we don't start searching from 3 we just need to make sure our starting seed is low enough so that we don't accidentally miss the first sequence containing the desired number of primes. Adopting this strategy not only reduces the time needed, it also reduces the number of entries in the lookup` cache.

#!/usr/bin/env python

''' Find the 1st Collatz sequence containing a given number of prime terms

    From http://stackoverflow.com/q/29920691/4014959

    Written by PM 2Ring 2015.04.29

    [Seed == 1805311, prime count == 67]
'''

import sys

def is_prime(a):
    ''' Test if odd `a` >= 3 is prime '''
    for i in xrange(3, int(1 + a ** 0.5), 2):
        if not a % i:
            return 0
    return 1


#Track the highest number generated so far; use a list
# so we don't have to declare it as global...
hi = [2]

#Cache for sequence prime counts. The key is the sequence seed,
# the value is the number of primes in that sequence.
lookup = {1:0, 2:1}

def count_prime_terms_iterative(n):
    ''' Count the number of primes terms in a Collatz sequence 
        Iterative version '''
    seq = []
    while n not in lookup:
        if n > hi[0]:
            hi[0] = n

        if n % 2:
            seq.append((n, is_prime(n)))
            n = n * 3 + 1
        else:
            seq.append((n, 0))
            n = n // 2

    count = lookup[n]
    for n, isprime in reversed(seq):
        count += isprime
        lookup[n] = count

    return count

def count_prime_terms_recursive(n):
    ''' Count the number of primes terms in a Collatz sequence
        Recursive version '''
    if n not in lookup:
        if n > hi[0]:
            hi[0] = n

        if n % 2:
            next_n = n * 3 + 1
            isprime = is_prime(n)
        else:
            next_n = n // 2
            isprime = 0
        lookup[n] = count_prime_terms(next_n) + isprime

    return lookup[n]


def find_seed(numprimes, start):
    ''' Find the seed of the 1st Collatz sequence containing
        `numprimes` primes, starting from odd seed `start` '''
    i = start
    mcount = 0

    print 'seed, prime count, highest term, dict size'
    while mcount < numprimes:
        count = count_prime_terms(i)
        if count > mcount:
            mcount = count
            print i, count, hi[0], len(lookup)
        i += 2


#count_prime_terms = count_prime_terms_recursive
count_prime_terms = count_prime_terms_iterative

def main():
    if len(sys.argv) > 1:
        numprimes = int(sys.argv[1])
    else:
        print 'Usage: %s numprimes [start]' % sys.argv[0]
        exit()

    start = int(sys.argv[2]) if len(sys.argv) > 2 else 3 

    #Round `start` up to the next odd number
    if start % 2 == 0:
        start += 1

    find_seed(numprimes, start)


if __name__ == '__main__':
    main()

When run with

$ ./CollatzPrimes.py 65

the output is

seed, prime count, highest term, dict size
3 3 16 8
7 6 52 18
19 7 160 35
27 25 9232 136
97 26 9232 230
171 28 9232 354
231 29 9232 459
487 30 39364 933
763 32 250504 1626
1071 36 250504 2197
4011 37 1276936 8009
6171 43 8153620 12297
10971 44 27114424 21969
17647 48 27114424 35232
47059 50 121012864 94058
99151 51 1570824736 198927
117511 52 2482111348 235686
202471 53 17202377752 405273
260847 55 17202377752 522704
481959 59 24648077896 966011
963919 61 56991483520 1929199
1564063 62 151629574372 3136009
1805311 67 151629574372 3619607

I'd like to point out that your solution **will* blow the stack :) (*rather quickly*) ``count_prime_terms(7)`` BOOM! — James Mills, Apr 28 '15 at 14:50
i.e: solving this by "brute force" is not going to be very efficient :) -- even with memoization. You'll either blow the stack or blow your resources -- Take your pick! — James Mills, Apr 28 '15 at 14:54
@JamesMills: Au contraire! My code solves this easily without hitting the recursion limit. It takes a little over 6 minutes on my 2GHz, single core machine, if I start searching from 2. But if I start searching from 1800000, I get the solution in under 10 seconds. Note that the function only recurses if it can't find `n` in the `lookup` dict. — PM 2Ring, Apr 28 '15 at 15:01
Thanks mate! been stuck on this stuck on this task for hours, got hung up in the memoization i found and tried to make that work, but this worked perfectly :) thanks again! :) — spørreren, Apr 28 '15 at 15:05
@spørreren: No worries! I've "polished" up your question a bit so it will be a bit easier to understand for future readers. — PM 2Ring, Apr 28 '15 at 15:14
@PM2Ring I'm not sure about your solution. I've been comparing yours to mine and what your produces. hmmm :) — James Mills, Apr 28 '15 at 15:36
Perhaps I don't fully understand either the OP's question or the problem :) See: http://www.primepuzzles.net/puzzles/puzz_476.htm for something similar? — James Mills, Apr 28 '15 at 16:00
@spørreren: You might like to take a look at the additions I've made to my answer... :) — PM 2Ring, Apr 29 '15 at 07:15
@JamesMills: I agree that recursion is generally a Bad Thing™ in Python. So perhaps you'll like the iterative solution to this problem which I've appended to my original answer. — PM 2Ring, Apr 29 '15 at 07:16

user448810 · Answer 2 · 2015-04-29T13:44:22.073

Let's begin with a function that determines if a number is prime; we will use the Miller-Rabin algorithm, which is faster than trial division for the size of numbers that we will be dealing with:

from random import randint

def isPrime(n, k=5): # miller-rabin
    if n < 2: return False
    for p in [2,3,5,7,11,13,17,19,23,29]:
        if n % p == 0: return n == p
    s, d = 0, n-1
    while d % 2 == 0:
        s, d = s+1, d/2
    for i in range(k):
        x = pow(randint(2, n-1), d, n)
        if x == 1 or x == n-1: continue
        for r in range(1, s):
            x = (x * x) % n
            if x == 1: return False
            if x == n-1: break
        else: return False
    return True

Since we want to find the first number that satisfies the criteria, our strategy will be to start from 2 and work our way up, storing results as we go. We prime the cache (that's a bad pun, sorry) with the prime counts in the Collatz sequences from 0 to 2:

primeCount = [0,0,1]

Function pCount(n) counts the primes in the Collatz sequence for n. As soon as the current value of the sequence k drops below n, we look up the result in the cache. Until then, we test each odd number in the Collatz sequence for primality and increment the prime count p if appropriate. When we have the prime count for n, we add it to the cache and return it.

def pCount(n):
    k, p = n, 0
    while k > 0:
        if k < n:
            t = p + primeCount[k]
            primeCount.append(t)
            return t
        elif k % 2 == 0:
            k = k / 2
        elif isPrime(k):
            p = p + 1
            k = 3*k + 1
        else:
            k = 3*k + 1

Now it's just a matter of computing the prime counts for each n, in order starting from 3, stopping when the prime count exceeds 65:

n = 3
t = pCount(n)
while t < 65:
    n = n + 1
    t = pCount(n)

That doesn't take long, less than a minute on my computer. Here's the result:

print n

There are 67 primes in the result. If you want to see them, here is a simple function that prints the Collatz sequence for a given n:

def collatz(n):
    cs = []
    while n != 1:
        cs.append(n)
        n = 3*n+1 if n&1 else n//2
    cs.append(1)
    return cs

And here's the list of primes:

filter(isPrime,collatz(n))

What a fun problem of recreational mathematics!

EDIT: Since people asked about the Miller-Rabin primality tester, let me show this simple primality tester based on a 2,3,5-wheel; it does trial division by 2, 3 and 5 and by numbers that aren't multiples of 2, 3, or 5, which includes some composites, so it's a little bit less efficient that trial division by primes, but there is no need to pre-compute and store the primes, so it's much easier to use.

def isPrime(n): # 2,3,5-wheel
    if n < 2: return False
    wheel = [1,2,2,4,2,4,2,4,6,2,6]
    w, next = 2, 0
    while w * w <= n:
        if n % w == 0: return False
        w = w + wheel[next]
        next = next + 1
        if next > 10: next = 3
    return True

Saying filter(isPrime,range(1000000)) identifies the 78498 primes less than a million in just a few seconds. You might want to compare timings for this primality tester compared to the Miller-Rabin tester and determine where the crossover point is in terms of runtime efficiency; I didn't do any formal timings, but it seems to solve the 65-collatz-prime problem in about the same time as the Miller-Rabin teseter. Or you might want to combine trial division with a 2,3,5-wheel up to some limit, say 1000 or 10000 or 25000, then run Miller-Rabin on the survivors; that quickly eliminates most composites, so it can run very quickly.

Could you paste your actual results? I'm curious to see whether we agree on the same solution. — James Mills, Apr 28 '15 at 23:36
Your result: n=1805311 -- Which I believe to be correct. Interested ti see what the OP says? -- My own results back this up: https://gist.github.com/prologic/db7667d419ce3f903745 (albeit my algorithm isn't as efficient ... yet!) — James Mills, Apr 29 '15 at 01:51
Yes, it is an interesting problem! FWIW, I wrote my iterative function before I saw your code, even though they're essentially the same algorithm. Thanks for posting that Miller-Rabin code; I was toying with the idea of using M-R myself, but I resisted the temptation to premature optimization. :) I just plugged your Miller-Rabin function into my code above. When testing for 52 primes, using M-R is around 20% _slower_ than trial factorization. For 65 primes, M-R is about 10 seconds faster, and I expect it will be noticeably superior when searching for much larger prime counts. — PM 2Ring, Apr 29 '15 at 07:41
@PM 2Ring: I had the Miller-Rabin test available, so I used it. I just added a trial-division primality tester to my answer, so you can do some comparative timings if you want. This problem will appear on [my blog](http://programmingpraxis.com) on Friday. — user448810, Apr 29 '15 at 13:46

finding the lowest collatz sequence that gives more that 65 primes

2 Answers2