0

I have a Pandas data frame (called "ud_flex" below) that looks like the one below: enter image description here The data frame has over 27 million observations in it that I'm trying to iterate through to do a calculation for each row. Below is the calculation that I'm using:

def set_fpts(pos, rank, curr_fpts):
    if pos == "RB" and rank >= 3.0:
        return 0
    elif pos == "WR" and rank >= 4.0:
        return 0
    elif (pos == "TE" or pos == "QB") and rank >= 2.0:
        return 0
    else:
        return curr_fpts

Here is the for loop that I've created:

players = ud_flex.shape[0]

for i in range(0,players):
    new_fpts = set_fpts(ud_flex.iloc[i]['position_name'], ud_flex.iloc[i]['wk_rank_orig'], ud_flex.iloc[i]['fpts'])
    ud_flex.at[i, 'fpts_orig'] = new_fpts

Does anyone have any suggestions for how to speed up this loop? It's currently taking nearly an hour! Thanks!

Sam Hoppen
  • 355
  • 2
  • 9

2 Answers2

1

You could start making an algorithm that exits faster:

def set_fpts(pos, rank, curr_fpts):
    if rank > 4:
        return 0
    if rank < 2:
        return curr_fpts
    if pos in ["TE", "QB"]:
        return 0
    if rank >= 3:
        if pos == "RB":
            return 0
    return curr_fpts
olanuza
  • 11
  • 2
0

In general, iterating through pandas data frames is slow, so it's not surprising that your for loop based approach is taking a while.

I suspect that the following alternative should work more quickly for a data frame of your size.

mask = (((ud_flex['position_name']=="RB") & (ud_flex['wk_rank_orig']>=3))
       |((ud_flex['position_name']=="WR") & (ud_flex['wk_rang_orig']>=4))
       |((ud_flex['position_name'].isin["TE","QB"]) & (ud_flex['wk_rang_orig']>=2)))
ud_flex['fpts_orig'][mask] = 0
ud_flex['fpts_orig'][~mask] = ud_flex['fpts']
Ben Grossmann
  • 4,387
  • 1
  • 12
  • 16