1
values=([0,2,1,'NaN',6],[4,4,7,6,7],[9,7,8,9,10])
time=[0,1,2,3,4]
slope_1 = stats.linregress(time,values[1]) # This works
slope_0 = stats.linregress(time,values[0]) # This doesn't work

Is there a way to ignore the NaN and do the linear regression on remaining values?

Thanks a lot in advance.

-gv

user2340760
  • 41
  • 2
  • 2
  • 5
  • 1
    See http://stackoverflow.com/questions/13643363/linear-regression-of-arrays-containing-nans-in-python-numpy – copeg Jul 05 '16 at 17:45
  • I was going to say "use `numpy.polyfit()`", but it has the [same problem](http://stackoverflow.com/questions/13675912/python-programming-numpy-polyfit-saying-nan). – Michael Molter Jul 05 '16 at 18:30
  • if someone is using a dataframe.. then df.dropna(inplace=True).. this drops any row by default if any of the feature value is NA... or one can use df.fillna() with a strategy – MANU Jun 25 '20 at 12:48

1 Answers1

9

Yes, you can do this using statsmodels:

import statsmodels.api as sm
from numpy import NaN

x = [0, 2, NaN, 4, 5, 6, 7, 8]
y = [1, 3, 4,   5, 6, 7, 8, 9]

model = sm.OLS(y, x, missing='drop')
results = model.fit()

In [2]: results.params
Out[2]: array([ 1.16494845])

Which gives you the same result as just removing the row with missing data:

x = [0, 2, 4, 5, 6, 7, 8]
y = [1, 3, 5, 6, 7, 8, 9]

model = sm.OLS(y, x)
results = model.fit()

In [4]: results.params
Out[4]: array([ 1.16494845])

But handles it automatically. You can also pass arguments other than drop if you want: http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLS.html

Jeff
  • 2,158
  • 1
  • 16
  • 29
  • Thanks a lot. Appreciate the help. – user2340760 Jul 06 '16 at 00:17
  • No problem. Statsmodels is a nice tool if you're going to do analysis in Python. If this answered your question please accept it though, that way it shows as answered in the queues! – Jeff Jul 06 '16 at 01:23