0

I have used the DataFrame.corr() function in order to determine the correlations between all of my input variables for a 5000x1000 matrix:

correlation_matrix = df.corr()

When I check the highest value of the correlation matrix, I receive a number slightly greater than 1.. :

correlation_matrix.max().max() = 1.000029

When I investigated further with the following check:

counter = 0
for i in range(len(correlation_matrix.columns)):
    for j in range (len(correlation_matrix)):
        if correlation_matrix.iloc[i,j] > 1.0:
            counter += 1

..it turns out that around 100 of them are actually slightly over 1, which should not be possible. What could be the reason for this?

  • Please post the datatypes and some sample values from two vars which have corr > 1. – Dave May 22 '20 at 14:58
  • The reason could be the representation of float values. see https://stackoverflow.com/questions/588004/is-floating-point-math-broken – jkr May 22 '20 at 14:58

0 Answers0