0

I am trying to oversample my dataset before training but i get this error "ValueError: Input contains NaN, infinity or a value too large for dtype('float64'" even though there are no NAN values.

This is the code that gives the error

sm = SMOTE(random_state=42)
X_res, y_res = sm.fit_resample(X, y)
y_res = pd.DataFrame(y_res)
print(y_res[0].value_counts())

This is the error i get

ValueError                                Traceback (most recent call last)
<ipython-input-18-001a5445f47a> in <module>()
  1 sm = SMOTE(random_state=42)
----> 2 X_res, y_res = sm.fit_resample(X, y)
  3 y_res = pd.DataFrame(y_res)
  4 print(y_res[0].value_counts())

3 frames
/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in _assert_all_finite(X, 
allow_nan, msg_dtype)
114             raise ValueError(
115                 msg_err.format(
--> 116                     type_err, msg_dtype if msg_dtype is not None else X.dtype
117                 )
118             )

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
KyloDank
  • 1
  • 1

1 Answers1

0

This line of code works, however keep in mind it will change your null & infinite values to 0:

df = df.replace((np.inf, -np.inf, np.nan), 0).reset_index(drop=True)
Kas
  • 313
  • 1
  • 14