1

I am getting this error from scikit-learn:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Which is a results from this check here. Based on this post, I can use df.replace([np.inf, -np.inf], np.nan).dropna(axis=1) because I want to detect any nan or inf values and then removing the row containing any one of them. However, I use Python 3.6 so the error says:

AttributeError: 'NoneType' object has no attribute 'dropna'

How can I modify df.replace([np.inf, -np.inf], np.nan).dropna(axis=1) so that I can detect a row containing inf or nan and then remove it?

Kristofer
  • 1,457
  • 2
  • 19
  • 27

1 Answers1

3

You almost had it: use dropna() with axis=0 (which is default), as that is rows. It will drop any rows in which there is an NaN:

df = pd.DataFrame({'x':[1,2,np.nan, np.inf, 3], 'y':[5,6,7,8,np.inf]})
>>> df

          x         y
0  1.000000  5.000000
1  2.000000  6.000000
2       NaN  7.000000
3       inf  8.000000
4  3.000000       inf

new_df = df.replace([np.inf, -np.inf], np.nan).dropna(axis=0)

>>> new_df
     x    y
0  1.0  5.0
1  2.0  6.0
sacuL
  • 49,704
  • 8
  • 81
  • 106