2

I would like to deal with nan result in Spearman test. looks like nan_policy parameter is broken. How can I fix this issue?

from scipy import stats
pvalue=stats.spearmanr([10,100],[1,100])[1]
print(pvalue)

returns nan

pvalue=stats.spearmanr([10,100],[1,100],nan_policy='omit')[1]
print(pvalue)

returns nan

link to related question that doesn't solve this issue. link to how to check for nan in python.

Community
  • 1
  • 1
Sadegh
  • 865
  • 1
  • 23
  • 47

1 Answers1

4

I'm not sure your example demonstrates a bug in the nan_policy parameter which refers to inputs not outputs and there are no nans in your input.

You're getting nan because your samples are too short for meaningful statistics. Technically, you are probably right, p value should always be finite, so it is a bug.

That said, if I do not totally misunderstand what Spearman's rank cc is, the function does return wrong p values, e.g.

>>> stats.spearmanr(np.arange(4.),np.arange(4.))
SpearmanrResult(correlation=1.0, pvalue=0.0)

having four samples with the same rank order really isn't that unlikely.

Edit: The above smells to me like they are using an approximation formula for the distribution of rank cc's which doesn't work too well for small n. So what can you do? If your n is small, don't use this function (sorry, I can't be more constructive; you could compute the distribution of rank cc's by brute force and then calculate the p-value yourself); if your actual samples are large you're probably fine, but I would crosscheck a few examples against some other stats software.

Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
  • 4
    "If *n* is small don't use this function" is good advice, although I'd be less strict about it. It is fine to use the function even with small *n* to obtain *r* but the *p*-value is meaningless. That is what the documentation has to say about p-values and sample size: *The p-values are not entirely reliable but are probably reasonable for datasets larger than **500** or so.* – MB-F Feb 01 '17 at 12:00