Is there a Pythonic way of skipping if statements in a for loop to make my code run faster?

Question

I'm writing a script in Python that essentially rolls a dice and checks whether the die roll exceeds a number x. I want to repeat this process n times and get the probability that the die roll exceeds the number x. e.g.

Count = 0
for _ in itertools.repeat(None, Iterations):
    x = 3
    die_roll = rnd.randint(1,6)
    if die_roll > x:
        Count += 1
Probability_of_exceed = Count / Iterations

I want to modify both the die roll and x based on user input. This user input will select different routines to modify the script e.g. "Andy's_Routine" might change x to 4. Currently I implement this using if statements in the for loop to check which routines are active, then applying them e.g.

Count = 0
for _ in itertools.repeat(None, Iterations):
    x = 3

    if "Andy's_Routine" in Active_Routines:
        x = 4

    die_roll = rnd.randint(1,6)
    if "Bill's_Routine" in Active_Routines:
        die_roll += 1 
    if "Chloe's_Routine" in Active_Routines:
        # do something
        pass

    if "Person_10^5's_Routine" in Active_Routines:
        # do something else
        pass

    if die_roll > x:
        Count += 1
Probability_of_exceed = Count / Iterations

In practice the routines are not so simple that they can be generalised, they might add an extra output for example. The routines can be and are concurrently implemented. The problem is that there could be thousands of different routines, such that each loop will spend the majority of its time checking the if statements, slowing down the program.

Is there a better way of structuring the code that checks which routines are in use only once, and then modifies the iteration somehow?

Are you looking for something like a switch statement? If so python doesn't support that, but you can use a `dict` to do similar stuff (though will be a bit more work in your case). — r.ook, Nov 08 '19 at 13:39
Perhaps you want the routines to be lambda functions in a dictionary and you just call the function corresponding to the name of the routine you want to run. Then you only have to do a single lookup in a dictionary — Metareven, Nov 08 '19 at 13:42
Instead of `Active_Routines` being a list of strings, make it a list of functions, and then execute all those function in each of your loop iterations. — Dan, Nov 08 '19 at 13:42
It is very unlikely that `if` statements are what is making your code run slowly. It's more likely because you are executing many sequential functions. If your number of iterations is high, then maybe build a table of results for combos of `x` and `die_roll`, that way if your iteration has the same state as a previous one, instead of recalculating all your functions, just look up the result you previously calculated (i.e. memoization). — Dan, Nov 08 '19 at 14:02
It would help if you added context on how many iterations you are performing and how many routines there are. Also maybe use a profiler, it could be that one of your routines is the bottle neck for example. — Dan, Nov 08 '19 at 14:06
@MadPhysicist Probably because it's faster than `range`: https://stackoverflow.com/a/9098860/7851470 — Georgy, Nov 08 '19 at 14:13
To be completely transparent the number of if statements is currently not a practical problem, the iteration is only run ~10^5 times. In the future as more routines are added the number of if statements may become a problem, so I was wondering if anyone had any neat (and general) solutions. — Ed Muir, Nov 08 '19 at 14:13
@Dan, the caching is a good idea, but won't be applicable if e.g. David's routine is to re-roll the dice. — kaya3, Nov 08 '19 at 14:27
@EdMuir http://wiki.c2.com/?PrematureOptimization - go with a pep8ified version of kaya3's answer — Dan, Nov 08 '19 at 14:32
@kaya3 but you can cache up to any non-deterministic steps. So if the first 10 routines are deterministic, cache those. Then if there is one random one and another 10 deterministic, make a second cache etc — Dan, Nov 08 '19 at 14:33
@Dan Yes, it's a good idea, and suggests representing each routine as an object with an `execute` method and an `is_deterministic` property. — kaya3, Nov 08 '19 at 14:36
@Dan You're right this certainly breaks the YouArentGonnaNeedIt rule. I think to solve my problem I'd need to write some kind of compiler that takes the routines and stitches them together somehow, this is well beyond the scope of the project. Thanks for your answers! — Ed Muir, Nov 08 '19 at 15:30

kaya3 · Accepted Answer · 2019-11-08T13:56:22.800

You're asking two things here - you want your code to be more Pythonic, and you want it to run faster.

The first one is easier to answer: make Active_Routines a list of functions instead of a list of strings, and call the functions from the list. Since these functions may need to change the local state (x and die_roll), you will need to pass them the state as parameters, and let them return a new state. The refactor might look like this:

def Andy(x, die_roll):
    return (4, die_roll)

def Bill(x, die_roll):
    return (x, die_roll + 1)

def Chloe(x, die_roll):
    # do something
    return (x, die_roll)

Active_Routines = [Andy, Bill, Chloe]

Count = 0
for i in range(Iterations):
    x = 3
    die_roll = rnd.randint(1,6)

    for routine in Active_Routines:
        x, die_roll = routine(x, die_roll)

    if die_roll > x:
        Count += 1

Probability_of_exceed = Count / Iterations

The second one is harder to answer. This refactoring now makes a lot of function calls instead of checking if conditions; so there could be fewer missed branch predictions, but more function call overhead. You would have to benchmark it (e.g. using the timeit library) to be sure. However, at least this code should be easier to maintain.

You should use pep8 naming conventions if you're making the code more pythonic. Also maybe tuple unpacking for readbility i.e. instead of `state` `x, die_roll = routine(x, die_roll)`. — Dan, Nov 08 '19 at 13:56
That's true; I chose to leave the names the same in case it interfaced with any other code, and because the naming wasn't related to the question, but you're right, following Python's naming conventions would improve the code. — kaya3, Nov 08 '19 at 13:57
Sorry, I should have left the "be pythonic" condition out of my question. I really only want to know if there is a better way to structure the code so that it runs faster. I added that condition because my first thought was to write something that writes another python script and then runs that, which does not sound very sightly... — Ed Muir, Nov 08 '19 at 14:02

Is there a Pythonic way of skipping if statements in a for loop to make my code run faster?

1 Answers1