Improve the perfomance of a code that uses for-loops

Question

I am trying to create a list based on some data, but the code I am using is very slow when I run it on large data. So I suspect I am not using all of the Python power for this task. Is there a more efficient and faster way of doing this in Python?

Here an explanantion of the code:

You can think of this problem as a list of games (list_type) each with a list of participating teams and the scores for each team in the game (list_xx).For each of the pairs in the current game it first calculate the sum of the differences in score from the previous competitions (win_comp_past_difs); including only the pairs in the current game. Then it update each pair in the current game with the difference in scores. Using a defaultdict keeps track of the scores for each pair in each game and update this score as each game is played.

In the example below, based on some data, there are for-loops used to create a new variable list_zz.

The data and the for-loop code:

import pandas as pd
import numpy as np
from collections import defaultdict
from itertools import permutations

list_type = [['A', 'B'], ['B'], ['A', 'B', 'C', 'D', 'E'], ['B'], ['A', 'B', 'C'], ['A'], ['B', 'C'], ['A', 'B'], ['C', 'A', 'B'], ['A'], ['B', 'C']]

list_xx = [[1.0, 5.0], [3.0], [2.0, 7.0, 3.0, 1.0, 6.0], [3.0], [5.0, 2.0, 3.0], [1.0], [9.0, 3.0], [2.0, 7.0], [3.0, 6.0, 8.0], [2.0], [7.0, 9.0]]

list_zz= []
#for-loop
wd = defaultdict(float)
for i, x in zip(list_type, list_xx):
    # staff 1
    if len(i) == 1:
        #print('NaN')
        list_zz.append(np.nan)
        continue
    # Pairs and difference generator for current game (i)
    pairs = list(permutations(i, 2))
    dgen = (value[0] - value[1] for value in permutations(x, 2))
    # Sum of differences from previous games incluiding only pair of teams in the current game
    for team, result in zip(i, x):
        win_comp_past_difs = sum(wd[key] for key in pairs if key[0] == team)
        #print(win_comp_past_difs)
        list_zz.append(win_comp_past_difs)
    # Update pair differences for current game
    for pair, diff in zip(pairs, dgen):
        wd[pair] += diff
print(list_zz)

Which looks like this:

[0.0,
 0.0,
 nan,
 -4.0,
 4.0,
 0.0,
 0.0,
 0.0,
 nan,
 -10.0,
 13.0,
 -3.0,
 nan,
 3.0,
 -3.0,
 -6.0,
 6.0,
 -10.0,
 -10.0,
 20.0,
 nan,
 14.0,
 -14.0]

If you could elaborate on the code to make it more efficient and execute faster, I would really appreciate it.

score 1 · Answer 1 · answered Feb 16 '20 at 18:08

Without reviewing the overall design of your code, one improvement pops out at me: move your code to a function.

As currently written, all of the variables you use are global variables. Due to the dynamic nature of the global namespace, Python must look up each global variable you use each and every time you use access it.⁽¹⁾ In CPython, this corresponds to a hash table lookup, which can be expensive, particularly if hash collisions are present.

In contrast, local variables can be known at compile time, and so are stored in a fixed-size array. Accessing these variables therefore only involves dereferencing a pointer, which is comparatively much faster.

With this principal in mind, you should be able to boost your performance (somewhere around a 40% drop in run time) by moving all you your code into a "main" function:

def main():
    ...
    # Your code here

if __name__ == '__main__':
    main()

⁽¹⁾ Source

@MarioArend Optimizations such as this will only ever give you a _slight_ edge. Your code is being bottle-necked by entirely different factors, which I have not yet taken the time to find. — Brian61354270, Feb 16 '20 at 18:17

Improve the perfomance of a code that uses for-loops

1 Answers1