I have a feature matrix with missing values NaNs, so I need to initialize those missing values first. However, the last line complains and throws out the following line of error:
Expected sequence or array-like, got Imputer(axis=0, copy=True, missing_values='NaN', strategy='mean', verbose=0)
.
I checked, it seems the reason is that train_fea_imputed is not in np.array format, but sklearn.preprocessing.imputation.Imputer form. How should I fix this?
BTW, if I use train_fea_imputed = imp.fit_transform(train_fea), the code works fine, but train_fea_imputed return an array with 1 dimension less than train_fea
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import Imputer
imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
train_fea_imputed = imp.fit(train_fea)
# train_fea_imputed = imp.fit_transform(train_fea)
rf = RandomForestClassifier(n_estimators=5000,n_jobs=1, min_samples_leaf = 3)
rf.fit(train_fea_imputed, train_label)
update: I changed to
imp = Imputer(missing_values='NaN', strategy='mean', axis=1)
and now the dimension problem did not occur. I think there is some inherent issues in the imputing function. I will come back when I finish the project.