Python - Memoization and Collatz Sequence

Question

When I was struggling to do Problem 14 in Project Euler, I discovered that I could use a thing called memoization to speed up my process (I let it run for a good 15 minutes, and it still hadn't returned an answer). The thing is, how do I implement it? I've tried to, but I get a keyerror(the value being returned is invalid). This bugs me because I am positive I can apply memoization to this and get this faster.

lookup = {}

def countTerms(n):
   arg = n
   count = 1
   while n is not 1:
      count += 1
      if not n%2:
         n /= 2
      else:
         n = (n*3 + 1)
      if n not in lookup:
         lookup[n] = count

   return lookup[n], arg

print max(countTerms(i) for i in range(500001, 1000000, 2))

Thanks.

I thought the point of memoization was to see if there was first a calculated value and if not then to calculate it and store it. This looks like you are storing it but never testing to see if you don't need to recalculate it. My observation doesn't explain the `keyerror` though — Jason Sperske, Mar 18 '13 at 23:08

welter · Accepted Answer · 2013-03-19T00:09:32.153

4

There is also a nice recursive way to do this, which probably will be slower than poorsod's solution, but it is more similar to your initial code, so it may be easier for you to understand.

lookup = {}

def countTerms(n):
   if n not in lookup:
      if n == 1:
         lookup[n] = 1
      elif not n % 2:
         lookup[n] = countTerms(n / 2)[0] + 1
      else:
         lookup[n] = countTerms(n*3 + 1)[0] + 1

   return lookup[n], n

print max(countTerms(i) for i in range(500001, 1000000, 2))

edited Mar 19 '13 at 00:09

answered Mar 18 '13 at 23:48

welter

156
4

It's actually 22 seconds faster than his at 1.7 seconds. Good job, this also is easier to understand for me! You're awesome :) – Tetramputechture Mar 18 '13 at 23:52
I actually optimized it by replacing 500001 with 3, because it turns out that it's faster if it starts at a smaller number (so it can easily cache numbers) – Tetramputechture Mar 19 '13 at 00:57
The better way to keep the memoized values is by using a static variable. Python doesn't have static variables _per se_, but you can fake them easily: instead of defining `lookup = {}` before the function, define `countTerms.lookup = {}` after (and outside) the function. This variable's state remains unchanged between calls, and you can access it inside the function as `countTerms.lookup` (or as `lookup` directly if you add a `lookup = countTerms.lookup` in the first line inside the function). – Jaime Mar 19 '13 at 03:10
@Jaime Or just make it an argument with default value: `def countTerms(n, lookup={})`, as poorsod did. – welter Mar 19 '13 at 07:46
@welter. Could you explain how this approach works? How did you come across it? – Nov 29 '16 at 17:59
@bartekbrak The only thing this solution differs from the classic solution to the problem is the fact that we memoize calculated function values in an array. Thanks to that the next time we want to know the function value we don't have to calculate it from scratch (which may take a long time) but just read it from the array where we've put it before. Memoization is a standard technique used in optimization, if you read any websites or books on algorithms it should be present there :) – welter Feb 21 '17 at 23:06

Benjamin Hodgson · Answer 2 · 2013-03-19T01:33:02.117

The point of memoising, for the Collatz sequence, is to avoid calculating parts of the list that you've already done. The remainder of a sequence is fully determined by the current value. So we want to check the table as often as possible, and bail out of the rest of the calculation as soon as we can.

def collatz_sequence(start, table={}):  # cheeky trick: store the (mutable) table as a default argument
    """Returns the Collatz sequence for a given starting number"""
    l = []
    n = start

    while n not in l:  # break if we find ourself in a cycle
                       # (don't assume the Collatz conjecture!)
        if n in table:
            l += table[n]
            break
        elif n%2 == 0:
            l.append(n)
            n = n//2
        else:
            l.append(n)
            n = (3*n) + 1

    table.update({n: l[i:] for i, n in enumerate(l) if n not in table})

    return l

Is it working? Let's spy on it to make sure the memoised elements are being used:

class NoisyDict(dict):
    def __getitem__(self, item):
        print("getting", item)
        return dict.__getitem__(self, item)

def collatz_sequence(start, table=NoisyDict()):
    # etc



In [26]: collatz_sequence(5)
Out[26]: [5, 16, 8, 4, 2, 1]

In [27]: collatz_sequence(5)
getting 5
Out[27]: [5, 16, 8, 4, 2, 1]

In [28]: collatz_sequence(32)
getting 16
Out[28]: [32, 16, 8, 4, 2, 1]

In [29]: collatz_sequence.__defaults__[0]
Out[29]: 
{1: [1],
 2: [2, 1],
 4: [4, 2, 1],
 5: [5, 16, 8, 4, 2, 1],
 8: [8, 4, 2, 1],
 16: [16, 8, 4, 2, 1],
 32: [32, 16, 8, 4, 2, 1]}

Edit: I knew it could be optimised! The secret is that there are two places in the function (the two return points) that we know l and table share no elements. While previously I avoided calling table.update with elements already in table by testing them, this version of the function instead exploits our knowledge of the control flow, saving lots of time.

[collatz_sequence(x) for x in range(500001, 1000000)] now times around 2 seconds on my computer, while a similar expression with @welter's version clocks in 400ms. I think this is because the functions don't actually compute the same thing - my version generates the whole sequence, while @welter's just finds its length. So I don't think I can get my implementation down to the same speed.

def collatz_sequence(start, table={}):  # cheeky trick: store the (mutable) table as a default argument
    """Returns the Collatz sequence for a given starting number"""
    l = []
    n = start

    while n not in l:  # break if we find ourself in a cycle
                       # (don't assume the Collatz conjecture!)
        if n in table:
            table.update({x: l[i:] for i, x in enumerate(l)})
            return l + table[n]
        elif n%2 == 0:
            l.append(n)
            n = n//2
        else:
            l.append(n)
            n = (3*n) + 1

    table.update({x: l[i:] for i, x in enumerate(l)})
    return l

PS - spot the bug!

@Tetramputechture `collatz_sequence` returns `l`, the list of all numbers in the sequence. The 0th element of the returned list would be the original number (the one you gave as the argument to `collatz_sequence`). So `collatz_sequence(n)[0] == n` for all integers `n`. — Benjamin Hodgson, Mar 18 '13 at 23:47
@Tetramputechture I've made a big improvement to the speed - cf my edit. Thanks for the brain-exercise, I enjoyed it! — Benjamin Hodgson, Mar 19 '13 at 01:02

score 0 · Answer 3 · answered Mar 05 '15 at 13:33

0

This is my solution to PE14:

memo = {1:1}
def get_collatz(n):

if n in memo : return memo[n]

if n % 2 == 0:
    terms = get_collatz(n/2) + 1
else:
    terms = get_collatz(3*n + 1) + 1

memo[n] = terms
return terms

compare = 0
for x in xrange(1, 999999):
if x not in memo:
    ctz = get_collatz(x)
    if ctz > compare:
     compare = ctz
     culprit = x

print culprit

answered Mar 05 '15 at 13:33

kaffuffle

11
1

Please indent code correctly. Also, can you explain how your version relates to others? – Nov 29 '16 at 17:04

Python - Memoization and Collatz Sequence

3 Answers3

Linked