8

Edit: Basically solved I think.

I am using spearmanr from scipy.stats to find the correlations between variables across a number of different samples. I have around 2500 variables and 36 samples (or 'observations')

If I calculate the correlations using all 36 samples, spearmanr works fine. If I use only the first 18 samples it also works fine. However if I use the latter 18 samples I get an error and nans are returned.

This is the error:

/Home/s1215235/.local/lib/python2.7/site-packages/numpy/lib/function_base.py:1945: RuntimeWarning: invalid value encountered in true_divide
return c / sqrt(multiply.outer(d, d))
/Home/s1215235/.local/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1718: RuntimeWarning: invalid value encountered in greater
cond1 = (scale > 0) & (x > self.a) & (x < self.b)
/Home/s1215235/.local/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1718: RuntimeWarning: invalid value encountered in less
cond1 = (scale > 0) & (x > self.a) & (x < self.b)
/Home/s1215235/.local/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1719: RuntimeWarning: invalid value encountered in less_equal
cond2 = cond0 & (x <= self.a)

This is the code:

populationdata = np.vstack(thing).astype(np.float)
rho, pval = stats.spearmanr(populationdata[:,sampleindexes], axis = 1)

(populationdata is a numpy array full of floats; [:,sampleindexes] allows only a few of the columns to be used.

And this is what rho is returned as:

[[ 1.                 nan         nan ...,  1.         -0.05882353
  -0.08574929]
 [        nan         nan         nan ...,         nan         nan
          nan]
 [        nan         nan         nan ...,         nan         nan
          nan]
 ..., 
 [ 1.                 nan         nan ...,  1.         -0.05882353
  -0.08574929]
 [-0.05882353         nan         nan ..., -0.05882353  1.          0.68599434]
 [-0.08574929         nan         nan ..., -0.08574929  0.68599434  1.        ]]
Catherine Georgia
  • 879
  • 3
  • 13
  • 17
  • 2
    What is `thing`, does it have nans/infs? A minimal runnable example would help debugging this further – ev-br Aug 20 '15 at 11:19
  • I don't know how to do a minimal running example without including the data which is causing the error, which will not be particularly minimal. But no, the data does not have nans or infs, it's just numbers. There are a lot of 0s though. – Catherine Georgia Aug 20 '15 at 11:48
  • 3
    *"There are a lot of 0s though."* So `populationdata[:,sampleindexes]` probably has rows that are *all* 0. That will cause `spearmanr` to generate `nan` (e.g. try `spearmanr([[0, 0, 0], [1, 2, 3]], axis=1)`). – Warren Weckesser Aug 20 '15 at 12:20
  • Thanks didn't know that! – Catherine Georgia Aug 20 '15 at 12:46
  • @CatherineGeorgia: It looks like the comment solved your problem, so I'll make it an answer. – Warren Weckesser Aug 21 '15 at 02:07
  • This issue has been raised here: https://github.com/scipy/scipy/issues/3728 with a proposed solution of checking for uniform input with `np.ptp(x) == 0` or `np.ptp(x) < eps)` for floating point precision. – user2573644 Apr 17 '19 at 15:02

1 Answers1

9

In a comment it was noted that "There are a lot of 0s though." So populationdata[:,sampleindexes] probably has rows that are all 0. That will cause spearmanr to generate nan. For example,

In [3]: spearmanr([[0, 0, 0], [1, 2, 3]], axis=1)
/Users/warren/anaconda/lib/python2.7/site-packages/numpy/lib/function_base.py:1957: RuntimeWarning: invalid value encountered in true_divide
  return c / sqrt(multiply.outer(d, d))
/Users/warren/anaconda/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1728: RuntimeWarning: invalid value encountered in greater
  cond1 = (scale > 0) & (x > self.a) & (x < self.b)
/Users/warren/anaconda/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1728: RuntimeWarning: invalid value encountered in less
  cond1 = (scale > 0) & (x > self.a) & (x < self.b)
/Users/warren/anaconda/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1729: RuntimeWarning: invalid value encountered in less_equal
  cond2 = cond0 & (x <= self.a)
Out[3]: (nan, nan)
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214