2

I have a dataframe df like below.

dates = pd.date_range('2000-01-01', '2001-01-01')
df1 = pd.DataFrame({'date':dates, 'value':np.random.normal(size = len(dates)), 'market':'GOLD'})
df2 = pd.DataFrame({'date':dates, 'value':np.random.normal(size = len(dates)), 'market':'SILVER'})
df = pd.concat([df1, df2])
df = df.sort('date')

          date  market     value
0   2000-01-01    GOLD -1.361360
0   2000-01-01  SILVER  0.255830
1   2000-01-02  SILVER  0.196953
1   2000-01-02    GOLD  1.422454
2   2000-01-03    GOLD -0.827672
...

I want to add another column as the 10d moving average of value, for each market.

Is there a simple df.groupby('market').??? that can achieve this? Or do I have to pivot the table to wide form, smooth each column, then melt back?

jf328
  • 6,841
  • 10
  • 58
  • 82

2 Answers2

4

You could use groupby/rolling/mean:

result = (df.set_index('date')
            .groupby('market')['value']
            .rolling(10).mean()
            .unstack('market'))

yields

market          GOLD    SILVER
date                          
2000-01-01       NaN       NaN
2000-01-02       NaN       NaN
2000-01-03       NaN       NaN
2000-01-04       NaN       NaN
2000-01-05       NaN       NaN
2000-01-06       NaN       NaN
2000-01-07       NaN       NaN
2000-01-08       NaN       NaN
2000-01-09       NaN       NaN
2000-01-10  0.310077  0.582063
2000-01-11  0.312008  0.752218
2000-01-12  0.151159  0.877230
2000-01-13  0.213611  0.742156
2000-01-14  0.440113  0.614720
2000-01-15  0.551360  0.649967
...
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Then if you want to merge them into a single column: `df['avg'] = avg.GOLD.where(df.market == 'GOLD', avg.SILVER)`. – John Zwinck Aug 30 '16 at 12:35
1

This builds on @unutbu's answer, and adds the results back to the original dataframe as a new column.

result = df.set_index('date').groupby('market')['value'].rolling(10).mean()

Now if df is sorted by market first and then date, the results should be in sync and we can just assign back the values

df.sort_values(['market','date'], inplace = True)
df['value10d_1'] = result.values

However, if you are paranoid as I am, merge should give peace of mind,

df = pd.merge(df, result.reset_index().rename(columns = {'value':'value10d_2'}), on = ['market','date'])

df['value10d_1'] - df['value10d_2'] # all 0
jf328
  • 6,841
  • 10
  • 58
  • 82