I am trying to build a pipeline in order to perform GridSearchCV to find the best parameters. I already split the data into train and validation and have the following code:
cols = ['home_ownership', "purpose","addr_state", "application_type", "term"]
column_transformer = make_pipeline(
(OneHotEncoder(categories = cols)),
(OrdinalEncoder(categories = X["grade"])),
"passthrough")
imputer = SimpleImputer(strategy='median')
scaler = StandardScaler()
model = SGDClassifier(loss='log',random_state=42,n_jobs=-1,warm_start=True)
pipeline_sgdlogreg = make_pipeline(imputer, column_transformer, scaler, model)
When I perform GridSearchCV I am getting the follwing error:
"cannot use median strategy with non-numeric data (...)"
I do not understand why am I getting this error. None of the categorical variables have missing values.
I perfoming the follwing: Imputation->Encoding->Scaling-> Modeling
Can anyone shed some light?