Issue when trying to remove rows containing nan or inf using Pandas dataframe

Question

I am getting this error from scikit-learn:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Which is a results from this check here. Based on this post, I can use df.replace([np.inf, -np.inf], np.nan).dropna(axis=1) because I want to detect any nan or inf values and then removing the row containing any one of them. However, I use Python 3.6 so the error says:

AttributeError: 'NoneType' object has no attribute 'dropna'

How can I modify df.replace([np.inf, -np.inf], np.nan).dropna(axis=1) so that I can detect a row containing inf or nan and then remove it?

score 3 · Accepted Answer · answered May 11 '18 at 22:06

You almost had it: use dropna() with axis=0 (which is default), as that is rows. It will drop any rows in which there is an NaN:

df = pd.DataFrame({'x':[1,2,np.nan, np.inf, 3], 'y':[5,6,7,8,np.inf]})
>>> df

          x         y
0  1.000000  5.000000
1  2.000000  6.000000
2       NaN  7.000000
3       inf  8.000000
4  3.000000       inf

new_df = df.replace([np.inf, -np.inf], np.nan).dropna(axis=0)

>>> new_df
     x    y
0  1.0  5.0
1  2.0  6.0

Thank you so much. 5 minutes til accepting your answer – Kristofer May 11 '18 at 22:08 — Kristofer, May 11 '18 at 22:08

Issue when trying to remove rows containing nan or inf using Pandas dataframe

1 Answers1