I have a dataframe next_train
with weekly data for many players (80,000 players observed through 4 weeks, total of 320,000 observations) and a dictionary players
containing a binary variable for some of the players (say 10,000). I want to add this binary variable to the dataframe next_train
(if a player is not in the dictionary players
, I set the variable equal to zero). This is how I'm doing it:
next_train = pd.read_csv()
# ... calculate dictionary 'players' ...
next_train['variable'] = 0
for player in players:
next_train.loc[next_train['id_of_player'] == player, 'variable'] = players[player]
However the for
loop takes ages to complete, and I don't understand why. It looks like the task is to perform binary search for the value player
in my dataframe for 10,000 times (size of the players
dictionary), but the execution time is several minutes. Is there any efficient way to do this task?