Dropping 'nan' with Pearson's r in scipy/pandas

Question

Quick question: Is there a way to use 'dropna' with the Pearson's r function in scipy? I'm using it in conjunction with pandas, and some of my data has holes in it. I know you used to be able suppress 'nan' with Spearman's r in older versions of scipy, but that functionality is now missing.

To my mind, this seems like a disimprovement, so I wonder if I'm missing something obvious.

My code:

for i in range(len(frame3.columns)):    
    correlation.append(sp.pearsonr(frame3.iloc[ :,i], control['CONTROL']))

Yes, you can use `dropna` for that. What's your question, exactly? — Ami Tavory, Aug 11 '16 at 11:05
Really? Every time I append it I get an index error. I've added my code above; where's the appropriate place to put it? — Lodore66, Aug 11 '16 at 11:19
*"...that functionality is now missing."* Are you referring to the `nan_policy` argument? That is still in `spearmanr`. In fact, the link that you referred to as "older versions" is the documentation for the most recent release, 0.18.0. What version are you using? Check by running `import scipy; print(scipy.__version__)` — Warren Weckesser, Aug 11 '16 at 17:43
@WarrenWeckesser I think he might have confused spearman's with pearson's. There is no `nan_policy` for scipy.stats.pearsonr — rovyko, Oct 18 '18 at 19:31

score 20 · Accepted Answer · answered Aug 11 '16 at 11:23

20

You can use np.isnan like this:

for i in range(len(frame3.columns)):    
    x, y = frame3.iloc[ :,i].values, control['CONTROL'].values
    nas = np.logical_or(x.isnan(), y.isnan())
    corr = sp.pearsonr(x[~nas], y[~nas])
    correlation.append(corr)

answered Aug 11 '16 at 11:23

Ami Tavory

74,578
11
141
185

2

I got the error `AttributeError: 'numpy.ndarray' object has no attribute 'isnan'` – Steve Scott Oct 24 '19 at 16:01
5

@SteveScott: instead of `x.isnan()`, try `np.isnan(x)` – ramesh Dec 02 '19 at 20:05

Daniel Gibson · Answer 2 · 2017-03-16T22:08:04.130

1

You can also try creating temporary dataframe, and used pandas built-in method for computing pearson correlation, or use the .dropna method in the temporary dataframe to drup null values before using sp.pearsonr

for col in frame3.columns:    
     correlation.append(frame3[col].to_frame(name='3').join(control['CONTROL']).corr()['3']['CONTROL'])

edited Mar 16 '17 at 22:08

answered Mar 16 '17 at 22:02

Daniel Gibson

924
8
8

1

This is making some assumptions about joining, eg: the indices are compatible – Daniel Gibson Mar 16 '17 at 22:04

Dropping 'nan' with Pearson's r in scipy/pandas

2 Answers2