I have a dataset whose miss data is shown by ? (not by NaN). I want to replace them with mean of its column. For example my dataset is like this:
0,1,2,3
1,2,5,1.2
2,4,8,2.3
3,5,?,1
I want to replace ? with (2+5+8)/3=5
. So data will be like this:
0,1,2,3
1,2,5,1.2
2,4,8,2.3
3,5,5,1
I write this code based on this page and this question.
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
dataset_dataframe = pd.read_csv(DATASET_PATH, header = None)
for i in range(0 , len(dataset_dataframe.columns)-1):
if dataset_dataframe[i].dtype != np.number:
dataset_dataframe[i] = dataset_dataframe[i].replace('?' , np.nan)
print("%s -\n %s" %(i , dataset_dataframe[i]))
imputer_miss_data = SimpleImputer(missing_values=np.nan, strategy='mean')
corrected_column = imputer_miss_data.fit_transform(dataset_dataframe[i])
dataset_dataframe[i]=corrected_column
print(dataset_dataframe[i])
but it doesn't work. What should I do to replace miss data, which is shown as? in dataset, with mean of its column using SimpleImputer?