Python linear regression with NaN

Question

values=([0,2,1,'NaN',6],[4,4,7,6,7],[9,7,8,9,10])
time=[0,1,2,3,4]
slope_1 = stats.linregress(time,values[1]) # This works
slope_0 = stats.linregress(time,values[0]) # This doesn't work

Is there a way to ignore the NaN and do the linear regression on remaining values?

Thanks a lot in advance.

-gv

See http://stackoverflow.com/questions/13643363/linear-regression-of-arrays-containing-nans-in-python-numpy — copeg, Jul 05 '16 at 17:45
I was going to say "use `numpy.polyfit()`", but it has the [same problem](http://stackoverflow.com/questions/13675912/python-programming-numpy-polyfit-saying-nan). — Michael Molter, Jul 05 '16 at 18:30
if someone is using a dataframe.. then df.dropna(inplace=True).. this drops any row by default if any of the feature value is NA... or one can use df.fillna() with a strategy — MANU, Jun 25 '20 at 12:48

score 9 · Answer 1 · answered Jul 05 '16 at 18:52

Yes, you can do this using statsmodels:

import statsmodels.api as sm
from numpy import NaN

x = [0, 2, NaN, 4, 5, 6, 7, 8]
y = [1, 3, 4,   5, 6, 7, 8, 9]

model = sm.OLS(y, x, missing='drop')
results = model.fit()

In [2]: results.params
Out[2]: array([ 1.16494845])

Which gives you the same result as just removing the row with missing data:

x = [0, 2, 4, 5, 6, 7, 8]
y = [1, 3, 5, 6, 7, 8, 9]

model = sm.OLS(y, x)
results = model.fit()

In [4]: results.params
Out[4]: array([ 1.16494845])

But handles it automatically. You can also pass arguments other than drop if you want: http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLS.html

No problem. Statsmodels is a nice tool if you're going to do analysis in Python. If this answered your question please accept it though, that way it shows as answered in the queues! — Jeff, Jul 06 '16 at 01:23

Python linear regression with NaN

1 Answers1