21

I have a problem with this piece of Python-code:

import matplotlib
matplotlib.use("Agg")

import numpy as np
import pylab as pl

A1=np.loadtxt('/tmp/A1.txt',delimiter=',')
A1_extrema = [min(A1),max(A1)]
A2=np.loadtxt('/tmp/A2.txt',delimiter=',')

pl.close()
ab = np.polyfit(A1,A2,1)
print ab
fit = np.poly1d(ab)
print fit
r2 = np.corrcoef(A1,A2)[0,1]
print r2
pl.plot(A1,A2,'r.', label='TMP36 vs. DS18B20', alpha=0.7)
pl.plot(A1_extrema,fit(A1_extrema),'c-')
pl.annotate('{0}'.format(r2) , xy=(min(A1)+0.5,fit(min(A1))), size=6, color='r' )

pl.title('Sensor correlations')
pl.xlabel("T(x) [degC]")
pl.ylabel("T(y) [degC]")
pl.grid(True)
pl.legend(loc='upper left', prop={'size':8})
pl.savefig('/tmp/C123.png')

A1 and A2 are arrays containing temperature readings from different sensors. I want to find a correlation between the two and show that graphically. However, occasionally, sensor-read-errors occur. And in such a case a NaN is inserted in one of the files instead of a temperature value. Then the np.polyfit refuses to fit the data and returns [nan, nan] as a result. All else fails after that as well.

My question: How can I convince numpy.polyfit to ignore the NaN values? N.B.: Datasets are relatively small at the moment. I expect that they may grow to about 200k...600k elements once deployed.

Daniel
  • 36,610
  • 3
  • 36
  • 69
Mausy5043
  • 906
  • 2
  • 17
  • 39
  • 2
    possible duplicate of [Python programming - numpy polyfit saying NAN](http://stackoverflow.com/questions/13675912/python-programming-numpy-polyfit-saying-nan) – farenorth Feb 21 '15 at 16:57
  • 3
    The solution to that question has your answer. In your case you would do `idx = np.isfinite(A1) & np.isfinite(A2)` then call polyfit, `ab = np.polyfit(A1[idx], A2[idx], 1)`. – farenorth Feb 21 '15 at 17:01
  • @farenorth : Thanks. Please make your second comment an answer and I'll accept it. – Mausy5043 Feb 21 '15 at 17:28

1 Answers1

47

I know this is a little old, but if you have arrays that have NaNs in them, you have to "clean them up" by only considering the indexes that are finite. The way to do this is

idx = np.isfinite(x) & np.isfinite(y)
ab = np.polyfit(x[idx], y[idx], 1)

That way you pass only the "good" points to polyfit.

TomCho
  • 3,204
  • 6
  • 32
  • 83