Calculating Pearson correlation

Question

I'm trying to calculate the Pearson correlation coefficient of two variables. These variables are to determine if there is a relationship between number of postal codes to a range of distances. So I want to see if the number of postal codes increases/decreases as the distance ranges changes.

I'll have one list which will count the number of postal codes within a distance range and the other list will have the actual ranges.

Is it ok to have a list that contain a range of distances? Or would it be better to have a list like this [50, 100, 500, 1000] where each element would then contain ranges up that amount. So for example the list represents up to 50km, then from 50km to 100km and so on.

@Krab Removed unnecessary information inline with SO policy, SO is a question and answer site so saying I would appreciate help is redundant, to say thanks you upvote and accept answer.. if you want more information on this read the faq and dig around on meta stackoverflow — Chris Seymour, Nov 30 '12 at 16:08

score 16 · Accepted Answer · edited Mar 02 '15 at 21:42

16

Use scipy :

scipy.stats.pearsonr(x, y)

Calculates a Pearson correlation coefficient and the p-value for testing non-correlation.

The Pearson correlation coefficient measures the linear relationship between two datasets. Strictly speaking, Pearson’s correlation requires that each dataset be normally distributed. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.

The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets. The p-values are not entirely reliable but are probably reasonable for datasets larger than 500 or so.

Parameters :

x : 1D array

y : 1D array the same length as x

Returns :

(Pearson’s correlation coefficient, : 2-tailed p-value)

edited Mar 02 '15 at 21:42

MojiProg

1,992
1
16
8

answered Nov 30 '12 at 16:09

lucasg

10,734
4
35
57

2

Ok, so what matters more is that both the x and y arrays are of the same length. Then you are comparing elements x[i] with element y[i]? – user94628 Nov 30 '12 at 16:43
1

yep. In your case, x should be equal to the distances considered, and y[i] should return the number of postal code at distances[i]. To see the actual computation for the Pearson : http://stackoverflow.com/questions/3949226/calculating-pearson-correlation-and-significance-in-python – lucasg Nov 30 '12 at 16:49
Cool, so x[i] could mean up to that distance? – user94628 Nov 30 '12 at 16:52
Yes, x[i] could mean up to that distance. If all the distances are computed from a particular starting point, then x[i] is just an area of that distance, and the corresponding y[i] would be how many postal codes are covered in that area. – Antimony Nov 14 '15 at 22:08
Make sure that the arrays x and y have a mean of 0. Otherwise you will get an incorrect value. – DollarAkshay Jun 14 '18 at 08:26

Antimony · Answer 2 · 2015-12-18T22:24:52.447

7

You can also use numpy:

numpy.corrcoef(x, y)

which would give you a correlation matrix that looks like:

[[1          correlation(x, y)]
[correlation(y, x)          1]]

edited Dec 18 '15 at 22:24

answered Nov 14 '15 at 22:15

Antimony

2,230
3
28
38

score 0 · Answer 3 · answered Feb 15 '20 at 08:57

0

try this:

 val=Top15[['Energy Supply per Capita','Citable docs per Capita']].rank().corr(method='pearson')

answered Feb 15 '20 at 08:57

Shaurya

136
1
4
20

score 0 · Answer 4 · answered Oct 13 '21 at 21:02

0

In Python 3.10 correlation() function was added to the statistics module of the Python standard library, it can be directly used by importing the statistics module:

import statistics

statistics.correlation(words, views)

answered Oct 13 '21 at 21:02

Cem Önel

721
6
8

Calculating Pearson correlation

4 Answers4