Here is my code. It start with pipeline (standardizing,replace null value,onehotencoding and selectkbest) with lightgbm model to fit my data.
numeric_features = ['X10','X11', 'X12', 'X13', 'X14']
numeric_transformer = Pipeline(steps=[('scaler', StandardScaler())])
categorical_features = ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'X9']
categorical_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='constant', fill_value='FLAG_NAN')),('onehot', OneHotEncoder(handle_unknown='ignore'))])
preprocessor = ColumnTransformer(transformers=[('num', numeric_transformer, numeric_features),('cat', categorical_transformer, categorical_features)])
pipe = Pipeline(steps=[('preprocessor', preprocessor),('selector', SelectKBest(mutual_info_classif, k=5)),('classifier',LGBMClassifier())])
search_space = dict(classifier =[LGBMClassifier()])
X_train = train.drop(columns=['Y'])
X_test = test.drop(columns=['Y'])
y_train = train['Y']
y_test = test['Y']
grid_search_pipe =
GridSearchCV(estimator=pipe,param_grid=search_space,scoring="neg_mean_squared_error",cv=5)
grid_search_pipe.fit(X_train, y_train, classifier__early_stopping_rounds=10, classifier__eval_metric="rmse", classifier__eval_set=[[X_test, y_test]])
And I got this error
ValueError: DataFrame.dtypes for data must be int, float or bool.
Did not expect the data types in the following fields: X1, X2, X3, X4, X5, X6, X7, X8, X9
My data has some categorical column.