I work with large data sheets, in which I am trying to correlate all of the columns.
I achieve this using:
df = df.rolling(5).corr(pairwise = True)
This produces data like this:
477
s1 -0.240339 0.932141 1.000000 0.577741 0.718307 -0.518748 0.772099
s2 0.534848 0.626280 0.577741 1.000000 0.645064 -0.455503 0.447589
s3 0.384720 0.907782 0.718307 0.645064 1.000000 -0.831378 0.406054
s4 -0.347547 -0.651557 -0.518748 -0.455503 -0.831378 1.000000 -0.569301
s5 -0.315022 0.576705 0.772099 0.447589 0.406054 -0.569301 1.000000
for each row contained in the data set. 477 in this case being the row number or index, and s1 - s5 being the column titles.
The goal is to find when the sensors are highly correlated with each other. I want to achieve this by (a) calculating the correlation using a rolling window of 5 rows using the code above, and (b) for each row produced, i.e i = 0 to i = 500 for a 500 row excel sheet, sum the tables dataframe.rolling(5).corr() produces for each value of i, i.e. produce one value per unit time such as in the graph included at the bottom. I am new to stackoverflow so please let me know if there's more information I can provide.
Example code + data:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
d = {'col1': [-2587.944231, -1897.324231,-2510.304231,-2203.814231,-2105.734231,-2446.964231,-2963.904231,-2177.254231, 2796.354231,-2085.304231], 'col2': [-3764.468462,-3723.608462,-3750.168462,-3694.998462,-3991.268462,-3972.878462,3676.608462,-3827.808462,-3629.618462,-1841.758462,], 'col3': [-166.1357692,-35.36576923, 321.4157692,108.9257692,-123.2257692, -10.84576923, -100.7457692, 89.27423077, -211.0857692, 101.5342308]}
df = pd.DataFrame(data=d)
dfn = df.rolling(5).corr(pairwise = True)
MATLAB code which accomplishes what I want:
% move through the data and get a correlation for 5 data points
for i=1:ns-4 C(:,:,i)=corrcoef(X(i:i+4,:));
cact(i)=sum(C(:,:,i),'all')-nv; % subtracting nv removes the diagaonals that are = 1 and dont change
end
For the original data, the following is the graph I am trying to produce in Python, where the x axis is time: Correlation Graph