0

I am trying to perform a Collatz algorithm on the following code. It works fine when I use a range of 1-10 etc... However, if the range is for example 1-500,000 it's too slow and won't ever show me the output of the longest sequence.

numberArray = []

s=int(1)
f=int(10)

def collatz(n):
    global count
    if n == 1:
        count += 1
        numberArray.append(count)
        return True
    elif (n % 2) == 0:
        count += 1
        collatz(n/2)
    else:
        count += 1
        collatz(3*n+1)
        
for i in range (s, f+1):
  count = 0
  ourNumber = i
  collatz(i)

print(max(numberArray))
PoTheBox
  • 21
  • 4
  • You should show what you've tried. Someone can do it for you and post the solution but it helps to see where you went wrong and provide advice from there. – Matt Feb 03 '22 at 22:44
  • Two very important pieces of advice: (1) Don't use `global`. (2) Use `return`. – Stef Feb 03 '22 at 22:45
  • Here you want your function to return the length of the sequence. So, use `return` to return that. When`n==1`, the length of the sequence is 1, so use `return 1`, not `return True`. When `n % 2 == 0`, the length of the sequence is going to be 1 more than the length of the sequence starting with `n // 2`, so `return 1 + collatz(n // 2)`. Etc. – Stef Feb 03 '22 at 22:47
  • Once you are using `return` properly, adding a cache becomes easier. – Stef Feb 03 '22 at 22:48
  • Do not use CamelCase for variable identificators, put spaces between operators and operands if operator is not `:`, no need to cast int to int `s = 1` is the same as `s = int(1)` – sudden_appearance Feb 03 '22 at 23:02
  • *"1-500,000 it's too slow and **won't ever** show me the output"* - Um... I just tried that and it only took 24 seconds. – Kelly Bundy Feb 04 '22 at 01:06
  • Yes, even running with 1-10,000,000 it still finishes, taking about 10 minutes. The version using the dictionary to cache the results, however, takes only about 20s. – vorrade - Feb 04 '22 at 10:51
  • 1
    https://stackoverflow.com/questions/70750152/further-optimisation-of-project-euler-problem-14-collatz-sequence/70750549#70750549 – Matt Timmermans Feb 04 '22 at 12:32
  • @MattTimmermans: Ok, I took up your challenge and improved slightly on your performance! – vorrade - Feb 06 '22 at 10:37

2 Answers2

2

Stef means something like this, which uses a dictionary to memorise the values that have already been counted:

s = 1
f = 10000000

def collatz(n):
    if n in collatz.memory:
        return collatz.memory[n]
    if (n % 2) == 0:
        count = collatz(n//2)+1
    else:
        count = collatz((3*n+1)//2)+2
    collatz.memory[n] = count
    return count

collatz.memory = {1:0}

highest = max(collatz(i) for i in range(s, f+1))
highest_n = max(collatz.memory, key=collatz.memory.get)

print(f"collatz({highest_n}) is {highest}")

Output: collatz(8400511) is 685

vorrade -
  • 114
  • 5
  • 1
    you can use an object attribute instead of a global: `def collatz(n): if n in collatz.data: ...; ...` then below the definition of the function: `collatz.data = {1:1}`. – Stef Feb 03 '22 at 22:58
  • I have been asking myself, whether it would be more efficient to just go through the odd numbers. I think it might make a noticeable difference... – vorrade - Feb 03 '22 at 22:58
  • 1
    and you can write directly `highest = max(collatz(i) for i in range(s, f+1))` – Stef Feb 03 '22 at 22:59
  • Thanks for the useful tips - making things more Pythonic. – vorrade - Feb 03 '22 at 23:02
  • This might be silly, but how does a simple function have an attribute? From where does `collatz.calculatedData` originate? I can see this as a class attrib, but `collatz` is a simple function. – S3DEV Feb 03 '22 at 23:12
  • 1
    Functions are objects, too. So they can have attributes - as the example shows! – vorrade - Feb 03 '22 at 23:17
  • Or `max(map(collatz, range(s, f+1, 2)))`. – Kelly Bundy Feb 03 '22 at 23:54
  • 1
    Oh and I'd either hardcode 1 instead of s, or ensure that s is odd. Otherwise you risk someone trying an even s and failing. – Kelly Bundy Feb 03 '22 at 23:56
  • @KellyBundy : good point – vorrade - Feb 04 '22 at 09:51
  • @vorrade- where did you get caclulatedData from? is that numberArray in my situation ? – PoTheBox Feb 04 '22 at 11:01
  • @PoTheBox : the `calculatedData` is an attribute of the `collatz` function that is created by the code line `collatz.calculatedData = {1:1}` and predefines the result of `collatz(1)` to be `1` – vorrade - Feb 04 '22 at 11:27
  • @PoTheBox : I have also used the fact the `3*n+1` is an even number to optimise the code. And I have also printed out which n gets the highest result. – vorrade - Feb 04 '22 at 11:48
  • I have now renamed `calculatedData` to `memory`. Apparently the naming conventions want camelCase only in class names... – vorrade - Feb 04 '22 at 13:20
0

Use lru_cache decorator. Its function to memorize (cache) the returned value of function called with specific argument.

Also read how to write clean code in python

The next code solves your problem

from functools import lru_cache

number_array = []

s = 1
f = 500000


@lru_cache
def collatz(n: int):
    if n == 1:
        return 1
    elif n % 2 == 0:
        return 1 + collatz(n // 2)
    else:
        return 1 + collatz(3 * n + 1)


for i in range(s, f + 1):
    number_array.append(collatz(i))

print(number_array)
sudden_appearance
  • 1,968
  • 1
  • 4
  • 15
  • 1
    I really like the cleanliness of using `lru_cache` and think this is a great solution for actual code. If the OP is trying to learn to implement memoization, this hides that from them. – Matt Feb 03 '22 at 23:21
  • 1
    This solution actually fails with `f = 10000000` because the lru_cache doesn't work the way it is being used here. The problem is that both `n` and `step` are being used by the lru_cache, so there are no matches. – vorrade - Feb 03 '22 at 23:36
  • 1
    To be more exact, setting `f = 10000000` leads to: RecursionError: maximum recursion depth exceeded – vorrade - Feb 03 '22 at 23:40
  • 1
    @vorrade Indeed. The added step parameter is really bad, although there might be *some* matches. Using lru_cache's default to only remember the 128 latest cases doesn't help, either. – Kelly Bundy Feb 03 '22 at 23:44
  • Ok @vorrade-, i have updated with removing `step` from recursion, guess this will work now – sudden_appearance Feb 03 '22 at 23:50
  • @KellyBundy, updated without `step` now – sudden_appearance Feb 03 '22 at 23:50
  • [98.7% cache misses](https://tio.run/##fVDLasMwELzrK/YSItWtHwmBYgj0P0IwqiLVAlsSK/mQln67u36kCRQ6B2mHnRl2N1xT693@NeA4GvQ9mMGp5H0XwfbBY4IOh0ZJ1WrG3NC/a2wkorzCEU5nxiL9FTP0HsoJjLG3u@OiDSjfdTJ9cleDdUnUDAjWgIMjWRc6AXUa0FHYRHQ3Kzawm1TlXxVk92AoCtiJ1Rf1v@I9PFFuBpWgSY1HsDQVoHQfmsdnMHNrSXjcNpchaHfhtxgryB@QFuKPstNLVdZnwZSlg6zafD5FY53xXKweZW@V2X4pm/c2Rh2hAP5LMqpam6KoN98wR8DS2Ipx/AE). Really bad. – Kelly Bundy Feb 04 '22 at 00:03
  • @KellyBundy, considering that max number we reach in (1, 500000) range is 2,4*10^10 (which is 48000 times bigger) we can agree that this is not the worst scenario – sudden_appearance Feb 04 '22 at 00:10
  • Not sure what you mean. What is the worst scenario? – Kelly Bundy Feb 04 '22 at 00:13
  • @sudden_appearance this seems fine and all until errors in ```Traceback (most recent call last): def collatz(n: int):``` and also ```TypeError: Expected maxsize to be an integer or None``` for lru cache – PoTheBox Feb 04 '22 at 11:15
  • @PoTheBox, if you are using python3.9+ use just `cache` instead of `lru_cache`. To increase recursion depth use `sys.setrecursionlimit(depth: int)` – sudden_appearance Feb 04 '22 at 11:19
  • @sudden_appearance : Do you know why using `@lru_cache` affects the allowed recursion depth? – vorrade - Feb 04 '22 at 11:23
  • @sudden_appearance apparently the error is within the decorator and my python supports lru_cache but I cant get it to run ```TypeError: Expected maxsize to be an integer or None``` – PoTheBox Feb 04 '22 at 12:05
  • @sudden_appearance this is the full code right now ```from functools import lru_cache number_array = [] s = 1 f = 500000 @lru_cache def collatz(n: int): if n == 1: return 1 elif n % 2 == 0: return 1 + collatz(n // 2) else: return 1 + collatz(3 * n + 1) for i in range (s, f+1): number_array.append(collatz(i)) print(number_array)``` – PoTheBox Feb 04 '22 at 12:07
  • @PoTheBox what python version are you using? – sudden_appearance Feb 04 '22 at 12:08
  • @sudden_appearance oh wow, Python 2.7.17.... – PoTheBox Feb 04 '22 at 12:16
  • @PoTheBox, then [have a read](https://stackoverflow.com/questions/11815873/memoization-library-for-python-2-7) – sudden_appearance Feb 04 '22 at 12:19
  • @sudden_appearance yeah had a go before you sent that somehow, and its an error ```No module named 'functools32'``` or ```No module named 'repoze.lru'``` what I am gonna try to do is install it in my dependancy or something, kinda lost but I am going to try to see what I can do, thanks for your help! – PoTheBox Feb 04 '22 at 12:25