How to calculate the ewm correlation coefs after groupby

Question

E.G, I have following csv data (There are more than one group g in practice):

G,T,x,y
g,1,3,4
g,2,4,5
g,3,6,1
g,4,7,2
g,5,8,3
g,6,9,8

I want to calculate the exponential weighted correlation coefs between x and y of each group. So I expected the result:

G T     namedWhatever
g 1         NaN
g 2    1.000000
g 3   -0.867510
g 4   -0.792758
g 5   -0.510885
g 6    0.413379

which actually can calculated by:

dat.loc['g'].ewm(halflife=3).corr().loc[:, 'x', 'y']
Out[5]: 
T
1         NaN
2    1.000000
3   -0.867510
4   -0.792758
5   -0.510885
6    0.413379
Name: y, dtype: float64

What I have tried without luck:

In [3]: dat = pd.read_csv('test.csv').set_index(['G', 'T'])

In [4]: dat.groupby(level='G').transform(lambda x: x.ewm(halflife=3).corr())
Out[4]: 
       x    y
G T          
g 1  NaN  NaN
  2  1.0  1.0
  3  1.0  1.0
  4  1.0  1.0
  5  1.0  1.0
  6  1.0  1.0

What's the right way to do it? My pandas version is 0.19.2 and python 3.6.

Here are few vectorized solutions - http://stackoverflow.com/questions/42869495 — Divakar, Apr 26 '17 at 09:22

score 1 · Accepted Answer · edited May 23 '17 at 12:25

The problem is that corr returns the correlation matrix. So when you do ewm.corr it returns a panel. So you need to extract the extra diagonal components to get the correlation coefficient.

An explicit solution with a loop is:

res = pd.concat([el.ewm(halflife = 3).corr().xs('x', axis = 1).loc['y', :] for key, el in dat.groupby(level = 'G')])

This is clearer if you inspect el.ewm(halflife = 3).corr():

el.ewm(halflife = 3).corr()
Out[54]: 
<class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 2 (major_axis) x 2 (minor_axis)
Items axis: (g, 1) to (g, 6)
Major_axis axis: x to y
Minor_axis axis: x to y

Following this answer I realised you can avoid the loop by using the expression above but within an apply rather than transform method on the grouped object.

dat.groupby(level='G').apply(lambda x: x.ewm(halflife=3).corr().xs('x', axis = 1).loc['y', :]).T

In both cases, I obtain the expected output:

res
Out[55]: 
G  T
g  1         NaN
   2    1.000000
   3   -0.867510
   4   -0.792758
   5   -0.510885
   6    0.413379
Name: y, dtype: float64

Your solution is right, however, I want to avoid using the for loop. — Eastsun, Apr 26 '17 at 09:02

How to calculate the ewm correlation coefs after groupby

1 Answers1