I have a data set that contain numeric values. I'd like to measure the correlation between the columns
Let's consider :
dataset = pd.DataFrame({'A':np.random.rand(100)*1000,
'B':np.random.rand(100)*100,
'C':np.random.rand(100)*10,
't':np.random.rand(100)})
Mathematically, non-correlated data means that cov(a,b) = 0. But with real data, it should be near to zero.
np.cov(a,b)
this numpy should give us the covariance value between two. but I'd like to make sure that my dataset is not correlated, any trick to do that ?
UPDATE
from matplotlib.mlab import PCA
results = PCA(dataset.values)