How to optimize checking thousands of list elements thousands of times?

Question

I'm trying to develop a program that running the first n integers through the "Collatz Conjecture function", denoted as "c(x)", where any odd number is updated to thrice itself plus 1, and any even number is updated to half of itself, and it runs this until the number gets updated to 1. This is simply as a programming exercise. I'm not looking to prove anything with this. I just wanted to learn about optimizations such as bit manipulation.

In order to speed up the process I made it create a list of every unique number it generates through this function, and if it generates a previously generated number then it moves onto the next input. This creates the problem, however, of taking loads of time checking each element in my list (called "nums") every time it runs the function again.

The code I used for it looks like this:

if x not in nums:
    nums.append(int(x))
    c(x)

Is there a way to make checking this list for already-generated numbers any faster? Or is there maybe a totally different way that removes the need for the list and/or checking every element altogether? Once you start getting above n = 100,000 (meaning 100,000 starting inputs) checking the list takes a noticeably longer time, and any optimization could cut out an entire hour of calculating for larger values of n.

If you don't care about the order of elements, make nums a set instead of a list. Checking set membership is a lot faster than a list. — John Gordon, Aug 21 '21 at 00:50
To be clear, the "checking" that you're talking about is *only* the `x not in nums` part? I.e. you just want fast detection of whether `x in nums`? — Karl Knechtel, Aug 21 '21 at 01:39
As an aside, assuming that `c` is the function where this code is located, you probably do not want to use recursion for something that can be a simple while loop. Python does not implement tail recursion optimization by default, and what you've written might not even be properly tail recursive (if you don't know what all of that means: remember that when you call a function, it will eventually have to `return` somewhere; when you call the current function from within itself, that is **not** a "goto" to the top of the function.) — Karl Knechtel, Aug 21 '21 at 01:44
@JohnGordon I actually completely forgot that sets did that, I haven't worked much with them yet, so this actually really helps. — Take, Aug 22 '21 at 03:59
@KarlKnechtel yes, I was basically just looking for a faster way to detect values in the list (or whichever iterable works best, like sets in this case), or to find a faster solution altogether. That was the main thing slowing my program down, along with eating up loads of memory, and tail recursion was just my go-to method, so I was pretty much just winging it and hoping it would work, a while loop instead of a full on recursive function never occurred to me. — Take, Aug 22 '21 at 04:00

score 0 · Accepted Answer · answered Aug 21 '21 at 01:09

If you implement the algorithm as a recursive function, then I'd suggest looking at the functools function cache (or lru_cache).

If you have a processing-intensive function, these decorators will store the result for a given input, so that, if called again later in the same process, the function can just return the previously-calculated result. For highly-recursive algorithms (like factorials and, to a degree, collatz), this can result in huge time savings.

That said, if you implement the algorithm recursively, you risk hitting Python's recursion limit.

So glad you responded with this, `cache` made the whole program so much more optimized, not only in speed, but I also managed to condense my code into 9 lines, so this was exactly what I needed! — Take, Aug 22 '21 at 03:54

How to optimize checking thousands of list elements thousands of times?

1 Answers1