I have a dataframe with dates (30/09/2022 to 31/11/2022) and 15 stock prices (wrote 5 as reference) for each of these dates (excluding weekends).
Current Data:
DATES | A | B | C | D | E |
30/09/22 |100.5|151.3|233.4|237.2|38.42|
01/10/22 |101.5|148.0|237.6|232.2|38.54|
02/10/22 |102.2|147.6|238.3|231.4|39.32|
03/10/22 |103.4|145.7|239.2|232.2|39.54|
I wanted to get the Pearson correlation matrix, so I did this:
df = pd.read_excel(file_path, sheet_name)
df=df.dropna() #Remove dates that do not have prices for all stocks
log_df = df.set_index("DATES").pipe(lambda d: np.log(d.div(d.shift()))).reset_index()
corrM = log_df.corr()
Now I want to build the Pearson Uncentered Correlation Matrix, so I have the following function:
def uncentered_correlation(x, y):
x_dim = len(x)
y_dim = len(y)
xy = 0
xx = 0
yy = 0
for i in range(x_dim):
xy = xy + x[i] * y[i]
xx = xx + x[i] ** 2.0
yy = yy + y[i] ** 2.0
corr = xy/np.sqrt(xx*yy)
return(corr)
However, I do not know how to apply this function to each possible pair of columns of the dataframe to get the correlation matrix.