numpy.polyfit doesn't handle NaN values

Question

I have a problem with this piece of Python-code:

import matplotlib
matplotlib.use("Agg")

import numpy as np
import pylab as pl

A1=np.loadtxt('/tmp/A1.txt',delimiter=',')
A1_extrema = [min(A1),max(A1)]
A2=np.loadtxt('/tmp/A2.txt',delimiter=',')

pl.close()
ab = np.polyfit(A1,A2,1)
print ab
fit = np.poly1d(ab)
print fit
r2 = np.corrcoef(A1,A2)[0,1]
print r2
pl.plot(A1,A2,'r.', label='TMP36 vs. DS18B20', alpha=0.7)
pl.plot(A1_extrema,fit(A1_extrema),'c-')
pl.annotate('{0}'.format(r2) , xy=(min(A1)+0.5,fit(min(A1))), size=6, color='r' )

pl.title('Sensor correlations')
pl.xlabel("T(x) [degC]")
pl.ylabel("T(y) [degC]")
pl.grid(True)
pl.legend(loc='upper left', prop={'size':8})
pl.savefig('/tmp/C123.png')

A1 and A2 are arrays containing temperature readings from different sensors. I want to find a correlation between the two and show that graphically. However, occasionally, sensor-read-errors occur. And in such a case a NaN is inserted in one of the files instead of a temperature value. Then the np.polyfit refuses to fit the data and returns [nan, nan] as a result. All else fails after that as well.

My question: How can I convince numpy.polyfit to ignore the NaN values? N.B.: Datasets are relatively small at the moment. I expect that they may grow to about 200k...600k elements once deployed.

possible duplicate of [Python programming - numpy polyfit saying NAN](http://stackoverflow.com/questions/13675912/python-programming-numpy-polyfit-saying-nan) — farenorth, Feb 21 '15 at 16:57
The solution to that question has your answer. In your case you would do `idx = np.isfinite(A1) & np.isfinite(A2)` then call polyfit, `ab = np.polyfit(A1[idx], A2[idx], 1)`. — farenorth, Feb 21 '15 at 17:01
@farenorth : Thanks. Please make your second comment an answer and I'll accept it. — Mausy5043, Feb 21 '15 at 17:28

TomCho · Accepted Answer · 2016-05-22T21:43:52.777

47

I know this is a little old, but if you have arrays that have NaNs in them, you have to "clean them up" by only considering the indexes that are finite. The way to do this is

idx = np.isfinite(x) & np.isfinite(y)
ab = np.polyfit(x[idx], y[idx], 1)

That way you pass only the "good" points to polyfit.

edited May 22 '16 at 21:43

answered May 20 '16 at 21:01

TomCho

3,204
6
32
83

numpy.polyfit doesn't handle NaN values

1 Answers1

Linked