2

I have difficulty to use pd.rolling_mean function on the following data frame containing duplicated indices:

               amount
    20140101    3
    20140102    4
    20140103    3
    20140103    5
    20140103    1
    20140104    5
    20140105    6
    20140106    2
    …

I need to calculate the 3 day average of 'amount', for example, average from 20140101 to 20140103 should be (3+4+3+5+1)/5=3.2, the average of amount from 20140104 to 20140106 should be (5+6+2)/3=4.3

does anyone know how to do it? Thank you in advance!

Mazdak
  • 105,000
  • 18
  • 159
  • 188
user6396
  • 1,832
  • 6
  • 23
  • 38

2 Answers2

1

you can do:

>>> df
          amount
20140101       3
20140102       4
20140103       3
20140103       5
20140103       1
20140104       5
20140105       6
20140106       2
>>> xf = df.groupby(level=0)['amount'].agg(['sum', 'count'])
>>> xf
          sum  count
20140101    3      1
20140102    4      1
20140103    9      3
20140104    5      1
20140105    6      1
20140106    2      1
>>> pd.rolling_sum(xf['sum'], 3, 0) / pd.rolling_sum(xf['count'], 3, 0)
20140101    3.000
20140102    3.500
20140103    3.200
20140104    3.600
20140105    4.000
20140106    4.333
dtype: float64

and you get 3.2 and 4.3 for 20140103 and 20140106 respectively.

behzad.nouri
  • 74,723
  • 18
  • 126
  • 124
1

If your date column is already a datetime and is the index you can just call rolling_mean:

In [15]:

pd.rolling_mean(df['amount'], window=1,  freq='3d')
Out[15]:
date
2014-01-01    3.200000
2014-01-04    4.333333
Freq: 3D, dtype: float64

If required you can convert your index to a datetime by doing the following:

df.index = pd.to_datetime(df.index.astype(str), '%Y%m%d')

This is in case the index is in fact of dtype int64, if it's already a string then you can ignore the astype bit

EdChum
  • 376,765
  • 198
  • 813
  • 562
  • Didn't realize the usage of 'freq' until now, thank you! it won't work on data frames with irregular space which is the case of my actual data frames. – user6396 Feb 21 '15 at 23:32