1

I want to compute the correlation between 2 arrays. For this, I would like to use NumPy.

I used the numpy.correlate function on a small example:

import numpy as np

a = [1, 2, 3]
b = [2, 3, 4]

np.correlate(a, b)
>>> np.array([20])

I don't really know how to interpret that result. What I would like to have is a number between -1 and 1 to indicate the correlation, with 1 meaning the arrays are positively related and -1 meaning the arrays are negatively related.

How can I get this number?

JNevens
  • 11,202
  • 9
  • 46
  • 72

1 Answers1

5

You're using the wrong function. You're looking for numpy.corrcoef, which actually calculates the correlation coefficient.

a = [1, 2, 3]
b = [2, 3, 4]

>>> np.corrcoef(a, b)
array([[ 1.,  1.],
       [ 1.,  1.]])

As mentioned by Hooked, this returns a matrix of values from the covariance matrix.

Should you want the Pearson correlation coefficient, you can use pearsonr from scipy.stats.stats. Hooked's answer here is a proper implementation of this method.

Community
  • 1
  • 1
miradulo
  • 28,857
  • 6
  • 80
  • 93
  • It should be noted that this returns a _matrix_ of values from the covariance matrix. I think OP is looking for the pearson correlation coefficient. – Hooked Apr 13 '15 at 18:26
  • 2
    How to interpret this matrix? Isn't it possible to get a single number? – JNevens Apr 13 '15 at 18:27
  • @JNevens Ahh, you are looking for the Pearson. I'd follow the dupe then, Hooked has a good answer there. – miradulo Apr 13 '15 at 18:27