Get statistical difference of correlation coefficient in python

Asked Jul 22 '14 at 21:31

Active Jul 31 '14 at 14:01

Viewed 694 times

To get the correlation between two arrays in python, I am using:

from scipy.stats import pearsonr
x, y = [1,2,3], [1,5,7]
cor, p = pearsonr(x, y)

However, as stated in the docs, the p-value returned from pearsonr() is only meaningful with datasets larger than 500. So how can I get a p-value that is reasonable for small datasets?

My temporary solution:

After reading up on linear regression, I have come up with my own small script, which basically uses Fischer transformation to get the z-score, from which the p-value is calculated:

import numpy as np
from scipy.stats import zprob
n = len(x)
z = np.log((1+cor)/(1-cor))*0.5*np.sqrt(n-3))
p = zprob(-z)

It works. However, I am not sure if it is more reasonable that p-value given by pearsonr(). Is there a python module which already has this functionality? I have not been able to find it in SciPy or Statsmodels.

Edit to clarify:

The dataset in my example is simplified. My real dataset is two arrays of 10-50 values.

edited May 23 '17 at 11:44

Community

asked Jul 22 '14 at 21:31

dwitvliet

7,242
7
36
62

2

I think this question is better fitted in cross validated. – Korem Jul 22 '14 at 21:50
1

A correlation over a sample size of 3 is not sensible...I usually want at least a pair of 50 values before thinking a correlation might be useful. – N1B4 Jul 22 '14 at 23:10
@Korem I did consider it, but posted it here instead as it mainly is a coding issue. However, I will move it there if no one can answer here. – dwitvliet Jul 22 '14 at 23:35

Get statistical difference of correlation coefficient in python

0 Answers0