1

I have a Dataframe in the below format and I am trying to create df['New'] where it is a rotating value as per below which I will use to calculate the correlation between Alpha and New

Date       Alpha Bravo Charlie   New                         Correlation
2018-01-03    1     3      2       3 (from bravo column)          NaN
2018-01-04    2     6      4       6 (from bravo column)          NaN
2018-01-05    3     9      6       9 (from bravo column)          NaN
2018-01-06    4    12      8      12 (from bravo column)          NaN
2018-01-07    5    15     10      10 (from Charlie column)         X

Next Date:

Date       Alpha Bravo Charlie   New                         Correlation
2018-01-03    1     3      2       3 (from bravo column)          NaN
2018-01-04    2     6      4       6 (from bravo column)          NaN
2018-01-05    3     9      6       9 (from bravo column)          NaN
2018-01-06    4    12      8      12 (from bravo column)          NaN
2018-01-07    5    15     10      15 (from bravo column)           X  
2018-01-08    6    18     12      12 (from Charlie column)         Y

df['Correlation'] = df['Alpha'].rolling(window=5).corr(other=df['New'])

Any advise how can I create this New column with rotating values? (Such that my previous correlation will remain unchanged as X. My final objective is to get the Correlation column and the New column is just used for calculating the correlation only)

In other words, every time the correlation column is calculated it will use the latest value as Charlie but all the previous as Bravo.

Another way of explaining this will be that is will always use the last date of Charlie column and past 4 days of bravo to calculate the correlation with Alpha as per below:

enter image description here

J Ng
  • 779
  • 7
  • 18

1 Answers1

1

I think you need first add NaNs and then this solution with strides and then get correlation matrix:

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

N = 5
a = np.concatenate([[np.nan] * (N-1), df['Bravo'].values])
b = np.concatenate([[np.nan] * (N-1), df['Alpha'].values])
a1 = rolling_window(a, N)
a2 = rolling_window(b, N)

Remove last column of a1 and add values of Charlie column:

c = np.c_[a1[:, :-1], df['Charlie'].values[:, None]] 
print (c)
[[nan nan nan nan  2.]
 [nan nan nan  3.  4.]
 [nan nan  3.  6.  6.]
 [nan  3.  6.  9.  8.]
 [ 3.  6.  9. 12. 10.]
 [ 6.  9. 12. 15. 12.]
 [ 9. 12. 15. 18. 15.]]

Create DataFrames and remove first NaNs rows by iloc:

a = pd.DataFrame(a2, index=df.index).iloc[N-1:]
b = pd.DataFrame(c, index=df.index).iloc[N-1:]
df['Correlation1'] = a.corrwith(b, axis=1)
#for improve performance
#https://stackoverflow.com/a/41703623/2901002
df['Correlation2'] = corr2_coeff_rowwise(a2, c)

print (df)
        Date  Alpha  Bravo  Charlie  Correlation1  Correlation2
0 2018-01-03      1      3        2           NaN           NaN
1 2018-01-04      2      6        4           NaN           NaN
2 2018-01-05      3      9        6           NaN           NaN
3 2018-01-06      4     12        8           NaN           NaN
4 2018-01-07      5     15       10      0.894427      0.894427
5 2018-01-08      6     18       12      0.832050      0.832050
6 2018-01-09      7     21       15      0.832050      0.832050
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Thank you Jezrael, you saved me again, if you ever come to my country (Singapore) you should tell me and I will bring you around to eat some nice food :D – J Ng Apr 01 '18 at 13:13
  • @JNg - Thank you very much for it! I hope sometimes visit Singapure :) – jezrael Apr 01 '18 at 13:16