1

I have a data frame like the one below:

          index       Player      Team      Matchup   Game_Date WL   Min   PTS   FGM   FGA   FG%  3PM  3PA   3P%  FTM   FTA   FT%  OREB  DREB   REB   AST  STL  BLK  TOV    PF  Plus_Minus  Triple_Double  Double_Double    FPT   2PA   2PM         2P% Home_Away
276100      1           John Long  TOR    TOR @ BOS  04/20/1997  W   6.0   0.0   0.0   3.0   0.0  0.0  1.0   0.0  0.0   0.0     0   0.0   0.0   0.0   1.0  0.0  0.0  0.0   0.0         2.0            0.0            0.0   1.50   2.0   0.0    0.000000      Away
276101      2       Walt Williams  TOR    TOR @ BOS  04/20/1997  W  29.0   7.0   3.0   9.0  33.3  1.0  2.0  50.0  0.0   0.0     0   3.0   3.0   3.0   2.0  2.0  1.0  1.0   3.0        20.0            0.0            0.0  19.75   7.0   2.0   28.571429      Away
276102      3            Todd Day  BOS  BOS vs. TOR  04/20/1997  L  36.0  22.0   8.0  17.0  47.1  4.0  8.0  50.0  2.0   2.0   100   8.0   8.0   6.0   4.0  0.0  0.0  3.0   8.0       -21.0            0.0            0.0  36.00   9.0   4.0   44.444444      Home
276103      4       Doug Christie  TOR    TOR @ BOS  04/20/1997  W  39.0  27.0   8.0  19.0  42.1  3.0  9.0  33.3  8.0   8.0   100   8.0   8.0   1.0   5.0  3.0  1.0  0.0   8.0        30.0            0.0            0.0  45.25  10.0   5.0   50.000000      Away
276104      5         Brett Szabo  BOS  BOS vs. TOR  04/20/1997  L  25.0   5.0   1.0   4.0  25.0  0.0  0.0     0  3.0   4.0  75.0   1.0   1.0   3.0   1.0  0.0  0.0  0.0   1.0       -11.0            0.0            0.0  10.25   4.0   1.0   25.000000      Home

I would like to add a new column that takes each of the old columns and gives its x day moving average. However, I want the moving average for each unique person. For example, John Long could play several hundred games each played on a unique date. I want his moving average numbers to reflect only his performances. I've looked at the df.rolling() function in pandas and I don't know how to make it so it looks at each player individually. Any help would be appreciated.

          Name    Date  Points  MA
0    Joe Smith  1-1-19      10  NA
1  Sam Simmons  1-1-19      20  NA
2    Joe Smith  1-2-19      30  20
3  Sam Simmons  1-2-19      40  30
bballboy8
  • 400
  • 6
  • 25
  • 1
    What is expected output from sample data? – jezrael Jan 14 '19 at 06:28
  • A new column that would have the moving averages associated with each player specifically. For example, there would be a new column that was the past 5 game moving average for points which had the mean for the previous 5 games for that speicific player. For example, John Long would have the average of his 5 pervious games, Walt Williams would have the average of his 5 previous games, etc. – bballboy8 Jan 14 '19 at 06:31
  • yes, my question was about [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and add how looks new column - what numbers from sample data for verify my solution. – jezrael Jan 14 '19 at 06:32
  • I will have to manually calculate it so give me some time. – bballboy8 Jan 14 '19 at 06:58
  • ya, be free only calculate 2, 3 values, but mainly change data for better sample. – jezrael Jan 14 '19 at 06:59
  • A very basic example has been added to the question for a 2 moving average time period. Let me know if anything is unclear or if you need anything else to understand my question. – bballboy8 Jan 15 '19 at 05:04

2 Answers2

0

You can use groupby with rolling and mean, then add new column by join:

df['Date'] = pd.to_datetime(df['Date'], format='%m-%d-%y')

s = df.set_index('Date').groupby('Name')['Points'].rolling(2, freq='D').mean().rename('MA')

df = df.join(s, on=['Name','Date'])
print (df)
          Name       Date  Points    MA
0    Joe Smith 2019-01-01      10   NaN
1  Sam Simmons 2019-01-01      20   NaN
2    Joe Smith 2019-01-02      30  20.0
3  Sam Simmons 2019-01-02      40  30.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Cool answer as usual. I'm a big fan :) Just to study this further, is the freq argument introduced in newer versions of the rolling function? I can' t find it in the documentation https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html – kerwei Jan 15 '19 at 08:30
  • @kerwei - Thank you, it is describe [here](http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#groupby-syntax-with-window-and-resample-operations) – jezrael Jan 15 '19 at 08:32
  • Also, I think the OP is looking for a non-static rolling window if I'm not mistaken. Is this requirement taken care of by the freq argument? Otherwise, I suppose the size of the window is 2? – kerwei Jan 15 '19 at 08:33
  • @kerwei - there is parameter `freq='D'`, it means 2 days. – jezrael Jan 15 '19 at 08:33
  • Would this continue to work if the days weren't consecutive? – bballboy8 Jan 15 '19 at 17:33
  • @bballboy8 - not understand, do you think missing weekends or similar? Yes, it will working, but best test it. – jezrael Jan 16 '19 at 08:27
  • Thank you for the solution. However, for time periods of more than 5 days, I receive a memory error. Why would this be? – bballboy8 Jan 20 '19 at 22:29
0

Drawing inspiration from @jezrael's answer above, as well as the answer to another question here, here's a solution for running average by player - without the date window size constraint.

# Get the running count of Names, sorted by Date, Name
df['NameCount'] = df.sort_values(['Date','Name'], ascending=True).groupby('Name').cumcount() + 1
# Running sum of points, in the same order as above (important)
df['PointSum'] = df.sort_values(['Name','NameCount'], ascending=True).groupby('Name')['Points'].cumsum()
df['MA'] = df['PointSum']/df['NameCount']
# Drop the unneeded columns
df = df.drop(['NameCount', 'PointSum'], axis=1)

cumcount() method provided by @MaxU here, as an emulation of the SQL's row number, partition by method

kerwei
  • 1,822
  • 1
  • 13
  • 22