0

I'm trying to write a for loop that takes each row in a dataframe and compares it to the rows in a second dataframe.

If the row in the second dataframe:

  • isn't in the first dataframe already
  • has a higher value in the total points column
  • has a lower cost than the available budget (row_budget)

then I want to remove the row from the first dataframe and add the row from the second dataframe in its place.

Example data:

    df

       code team_name total_points now_cost
78    93284       BHA           38       50
395  173514       WAT           42       50
342   20452       SOU           66       50
92    17761       BUR           97       50
427   18073       WHU           99       50
69    61933       BHA          115       50
130  116594       CHE          116       50

    pos_pool
       code team_name total_points now_cost
438   90585       WOL          120       50
281   67089       NEW          131       50
419   37096       WHU          143       50
200   97032       LIV          208       65
209  110979       LIV          231      115

My expected output for the first three loops should be:

df

       code team_name total_points now_cost
92    17761       BUR           97       50
427   18073       WHU           99       50
69    61933       BHA          115       50
130  116594       CHE          116       50
438   90585       WOL          120       50
281   67089       NEW          131       50
419   37096       WHU          143       50

Here is the nested for loop that I've tried:

for index, row in df.iterrows():
    budget = squad['budget']
    team_limits = squad['team_limits']
    pos_pool = players_1920.loc[players_1920['position'] == row['position']].sort_values('total_points', ascending=False) 
    row_budget = row.now_cost + 1000 - budget

    for index2, row2 in pos_pool.iterrows():
        if (row2 not in df) and (row2.total_points > row.total_points) and (row2.now_cost <= row_budget):
            team_limits[row.team_name] += 1
            team_limits[row2.team_name] -=1
            budget += row.now_cost - row2.now_cost
            df = df.append(row2)
            df = df.drop(row)
        else:
            pass
return df

At the moment I am only iterating through the first dataframe but doesn't seem to do anything in the second.

Oobl
  • 1
  • 3
  • [Don't use iterrows()](https://stackoverflow.com/a/55557758)... – cs95 Jul 21 '19 at 22:28
  • Add example data. – Erfan Jul 21 '19 at 23:04
  • @cs95 thanks for the information and I'll definitely learn the other methods at a later date. For the moment I'm working with a small dataset at the moment so am not too worried about performance. I'm just trying to figure out why the code I've written isn't working properly. – Oobl Jul 21 '19 at 23:30
  • It's note code review here. _"If the row in the second dataframe meets certain conditions"_, explain what "certain conditions" are exactly and what your expected output looks like – Erfan Jul 21 '19 at 23:34
  • @Parfait my expected output does match the inputs. The top three rows of df have been removed and the bottom three rows of df have been added from pos_pool. budget is an integer declared outside the loop, team_limits is a dictionary that counts the number of team_names in the df, and position is a table filter to reduce the number of rows in pos_pool. For my purposes all three can be ignored. – Oobl Jul 22 '19 at 09:21
  • You definitely can handle this without looping. We can propose such a solution but may conflict with *ignored* columns. Original input data is more helpful. And your logic is not quite clear. Why is there two `WHU` records in desired output? And why no `LIV` records? – Parfait Jul 22 '19 at 12:27
  • There are 27 columns in the original df but only the columns in the example have conditions on them. The two WHU records in the output table have different values for the other columns. There are no LIV records in the output column because their now_cost values are higher than than required. – Oobl Jul 22 '19 at 12:41

0 Answers0