Python Replacing missing values

Question

I am trying to replace missing values in a specific column in a dataframe, but having some issues. Have tried:

from sklearn.impute import SimpleImputer
fill_0_with_mean = SimpleImputer(missing_values=0, strategy='mean')
X_train['Age'] = fill_0_with_mean.fit_transform(X_train['Age'])

and

X_train[:,15] = fill_0_with_mean.fit_transform(X_train[:,15])

and

X_train[:,15:16] = fill_0_with_mean.fit_transform(X_train[:,15:16])

and

X_train['Age'] = fill_0_with_mean.fit_transform(X_train['Age'].values)

and

X_train[:,15:16] = fill_0_with_mean.fit_transform(X_train[:,15:16].values)

But I keep getting errors around ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). or IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

I have zero and missing (NaN) values in my data. Can the imputer only do one of the two? How do I go about doing this? I have also tried casting my age column as an integer

X_train['Age'] = X_train['Age'].as_type('int32')

But this just gives me other errors.

my data looks like (the Age column)

	Age
0	31.0
1	79.0
2	53.0
3	40.0
4	55.0
	...
44872	NaN
44873	NaN
44874	NaN
44875	NaN
44876	NaN

Is it possible that numpy and pandas are getting mixed up? I used this to split my data into training and testing:

from sklearn.model_selection import train_test_split

dep_var = ['is_overdue']
features = model_data2.columns
features = features.drop(dep_var)

print(features)

X = model_data2[features].values
Y = model_data2[dep_var].values

split_test_size = 0.30

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=split_test_size, random_state=42)

I'd greatly appreciate the help.

Does this answer your question? [sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float64')](https://stackoverflow.com/questions/31323499/sklearn-error-valueerror-input-contains-nan-infinity-or-a-value-too-large-for) — PV8, Jul 16 '21 at 08:08
https://stackoverflow.com/questions/31323499/sklearn-error-valueerror-input-contains-nan-infinity-or-a-value-too-large-for — PV8, Jul 16 '21 at 08:08

score 0 · Answer 1 · answered Jul 16 '21 at 04:48

0

As you want to replace 0 by mean, you have to fill NaN by 0:

fill_0_with_mean = SimpleImputer(missing_values=0, strategy='mean')
X_train['Age'] = fill_0_with_mean.fit_transform(X_train['Age'].fillna(0))

answered Jul 16 '21 at 04:48

Corralien

109,409
8
28
52

Then I get this error: ` IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices ` – GenDemo Jul 19 '21 at 05:48

Python Replacing missing values

1 Answers1