2

I inherited a legacy codebase with lots of nested for loops looking something like:

def func(infile, some_other_data, outfile, status_variable):
    with open(infile, 'r') as f:
        with open(outfile, 'w') as outf:
            for line in f:
                # parse line
                for element in some_other_data:
                    standard_function(line, element)
                    if status_variable == 'status_A':
                        function_A(line, element)
                    elif status_variable == 'status_B':
                        function_B(line, element)
                    # handle other possible status variables
                    outf.write(new_line)

This code is performance relevant. To speed it up (in addition to other changes) I want to get rid of all the if-clauses being called n*m times and testing showed that this indeed gives some 10% improvement.

To do this, I simply copied and modified the main loop function for every possible status variable and called the different functions accordingly. This effectively moved the if-clauses outside of the loops. However, it is very ugly and made the library 4x as large.

Is there a (fairly) simple pythonic way of handling such cases where I want to reuse boilerplate loops and just change what is done with each iteration WITHOUT handling conditionals every time?

I have been playing around with decorators dynamically returning the loop function calling different subfunctions depending on the status variable but the end results looked horrible from a readability perspective. I am by no means a python expert so I might overlook some handy higher-level features that could be helpful here.

Any advice is highly appreciated.

zeawoas
  • 446
  • 7
  • 18

2 Answers2

5

Ideally you would pass the function itself instead of the status variable, but as this is legacy code, one solution without changing the interface would be to setup a dictionary of functions like so:

def func(infile, some_other_data, outfile, status_variable,
         status_functions={
             'status_A': function_A,
             'status_B': function_B,
         }
        ):

    try:
        status_function = status_functions[status_variable]
    except KeyError:
        status_function = lambda line, element: None

    with open(infile, 'r') as f, open(outfile, 'w') as outf:
        for line in f:
            # parse line
            for element in some_other_data:
                standard_function(line, element)

                status_function(line, element)
                # handle other possible status variables
                outf.write(new_line)
n-Holmes
  • 66
  • 4
  • As a programmer, you should try very, very, VERY hard not to use global variables. – quamrana Apr 05 '19 at 10:47
  • 1
    You could probably pass the global dictionary as a default-parameter to the function and let the dictionary persist throughout. Additionally, the if-check seems to be missing. – TrebledJ Apr 05 '19 at 10:50
  • Thanks for the suggestions, I've edited to bring the code in line with them. – n-Holmes Apr 05 '19 at 11:12
  • @quamrana I know, but isn't most skepticism towards global variables based in how bad it is to manipulate them? As long as the dict is only used to look stuff up, it doesn't look like that much of a bad practice to me. What am I missing? – zeawoas Apr 05 '19 at 11:14
  • @TrebledJ why passing it as default when it could simply be defined within the function body as well? – zeawoas Apr 05 '19 at 11:21
  • 2
    @zeawoas Because defining the dictionary inside the function will take time in initialising and destroying it. If you're calling the function multiple times, this may take a large toll on the overall performance (especially if there are many status variables). Sure defining in the function body works perfectly fine, but passing it as a default argument only initialises it once. (See ["Least Astonishment" and the Mutable Default Argument](https://stackoverflow.com/questions/1132941/least-astonishment-and-the-mutable-default-argument) for more info.) – TrebledJ Apr 05 '19 at 11:29
  • @n-Holmes The try-except block could be substituted with `dict.get(status_variable, lambda ...)`. Much neater, imo. :-) – TrebledJ Apr 05 '19 at 11:36
  • 1
    @n-Holmes: There is also a way to have an 'initialised-just-once' variable using this 'static_vars' decorator: https://stackoverflow.com/a/279586/4834 – quamrana Apr 05 '19 at 12:46
3

If there is a straight correspondence between status_variable -> function_name and also all the calls are a regular: function(line, element) you can pass in the function:

def func(infile, some_other_data, outfile, function_from_status_variable):
    with open(infile, 'r') as f:
        with open(outfile, 'w') as outf:
            for line in f:
                # parse line
                for element in some_other_data:
                    standard_function(line, element)

                    function_from_status_variable(line, element)

                    outf.write(new_line)

which is calculated once, thus:

def calc_function(status_variable):
    if status_variable == 'status_A':
        return function_A
    elif status_variable == 'status_B':
        return function_B
    # other tests follow, plus handle an unknown value

Finally call the functions like this:

function = calc_function(status_variable)
func(infile, some_other_data, outfile, function)

quamrana
  • 37,849
  • 12
  • 53
  • 71
  • I was considering that, but I can't change how the function is called elsewhere - so the arguments need to stay the same. Should have been more clear about that in the orig. question, sorry. – zeawoas Apr 05 '19 at 10:42
  • 1
    Ok, so just call `calc_function` from inside `func()` once before the first `with`. – quamrana Apr 05 '19 at 10:44