1

I have a pandas dataframe that looks like

Name    Date               Value
Sarah   11-01-2015         3
Sarah   11-02-2015         2
Sarah   11-03-2015         27
Bill    11-01-2015         42
Bill    11-02-2015         5
Bill    11-03-2015         15
.... (a couple hundred rows)

How do I get a 30 day (or x day) rolling sum of these values broken out by whoever is in the 'Name' column? The ideal output would have the same columns as the current dataframe, but instead of having the values for each row be what that person had as a value for that day, it would be the cumulative sum of what their values over the past 30 days.

I know I can do

result = pd.rolling_sum(df, 30)

to get the rolling sum overall. But how do I return a dataframe with that rolling sum grouped by the 'Name' column?

John
  • 1,405
  • 2
  • 14
  • 21
  • Just a few days ago there was a [bounty question](http://stackoverflow.com/questions/33442504/python-dataframe-rolling-sum-with-numbers-not-date/33447155#33447155) for a similar (nearly exact) python rolling sum by group but for last 7 days. My answer suggested using SQL to do this and avoid having pandas restructuring it in memory. If your data frame derives from a database, consider such a route. – Parfait Nov 19 '15 at 02:06
  • Ah, thanks for the heads up @Parfait ! – John Nov 20 '15 at 02:38

2 Answers2

1

Figured it out using the grigri group_resample function.

df = group_resample(df,date_column='Date',groupby=group_by,value_column='Value',how='sum',freq='d')
df = df.unstack(group_by).fillna(0)
result = pd.rolling_mean(df,30)
John
  • 1,405
  • 2
  • 14
  • 21
0

Note that if you don't need a precise temporal window, or if your dataset has 1 line per [day , user] (which seems to be your case), then the standard groupby of pandas is perfectly suited. See this very similar question

Otherwise, something like:

df.groupby('Name').rolling('30D', on="Date").Value.sum()

should work.

Community
  • 1
  • 1
PSAfrance
  • 11
  • 2