I have used the DataFrame.corr() function in order to determine the correlations between all of my input variables for a 5000x1000 matrix:
correlation_matrix = df.corr()
When I check the highest value of the correlation matrix, I receive a number slightly greater than 1.. :
correlation_matrix.max().max()
= 1.000029
When I investigated further with the following check:
counter = 0
for i in range(len(correlation_matrix.columns)):
for j in range (len(correlation_matrix)):
if correlation_matrix.iloc[i,j] > 1.0:
counter += 1
..it turns out that around 100 of them are actually slightly over 1, which should not be possible. What could be the reason for this?