4

I am having a problem trying to implement the 'rolling' functions in Pandas (i.e. rolling_std() and rolling_corr()) when using the group by functions. I have tried using the below formulas but I keep getting 'ValueError: cannot reindex from a duplicate axis'.

df is my dataframe:

df.groupby(level='ID')['Val1'].apply(lambda x: pd.rolling_std(x,30))

What I have:

ID  Date    Val1    Val2
A   1-Jan   45      22
A   2-Jan   15      66
A   3-Jan   55      13
B   1-Jan   41      12
B   2-Jan   87      45
B   3-Jan   82      66
C   1-Jan   33      34
C   2-Jan   15      67
C   3-Jan   46      22

What I need:

ID  Date    Val1    Val2    Rolling_Corr    Rolling_Std (Val1)
A   1-Jan   45  22      
A   2-Jan   15  66  0.1 1.2
A   3-Jan   55  13  0.16    2.5
B   1-Jan   41  12      
B   2-Jan   87  45  0.15    2.8
B   3-Jan   82  66  0.05    1.1
C   1-Jan   33  34      
C   2-Jan   15  67  0.09    1.5
C   3-Jan   46  22  0.11    2.4

Thanks

backtrack
  • 7,996
  • 5
  • 52
  • 99
Gohawks
  • 1,044
  • 3
  • 12
  • 26
  • I am getting closer. The above function for rolling_std actually works now. I had to re-index my dataframe. df['roll_corr'] = df.groupby(level='ID').apply(lambda x: pd.rolling_corr(x['Val1'],x['Val2'],2)) When I add the function for rolling_corr(), it does not return any errors but it also returns all null values. Any ideas? – Gohawks Mar 12 '15 at 13:27
  • This works but it is not grouped by 'ID': df['roll_corr'] = pd.rolling_corr(df['Val1'],df['Val2'],2). I can't get it to work with the .apply function either. – Gohawks Mar 12 '15 at 15:29
  • 2
    Hi Brendan, did you solve this issue? If so, can you please share your solution. – Stergios Mar 17 '16 at 15:18

1 Answers1

0

In newer versions of pandas the syntax of rolling has changed, for example, from rolling_std() to rolling().std() and works well when combined with groupby:

df.groupby('ID').rolling(2).std()

     ID   Date       Val1       Val2
ID                                  
A  0  A  1-Jan        NaN        NaN
   1  A  2-Jan  21.213203  31.112698
   2  A  3-Jan  28.284271  37.476659
B  3  B  1-Jan        NaN        NaN
   4  B  2-Jan  32.526912  23.334524
   5  B  3-Jan   3.535534  14.849242
C  6  C  1-Jan        NaN        NaN
   7  C  2-Jan  12.727922  23.334524
   8  C  3-Jan  21.920310  31.819805

Or see the following for corr which is similar but slightly more complicated due to the verbose output of corr:

Rolling Correlation with Groupby in Pandas

JohnE
  • 29,156
  • 8
  • 79
  • 109