I am struggling with a machine learning project, in which I am trying to combine :
- a sklearn column transform to apply different transformers to my numerical and categorical features
- a pipeline to apply my different transformers and estimators
- a
GridSearchCV
to search for the best parameters.
As long as I fill-in the parameters of my different transformers manually in my pipeline, the code is working perfectly. But as soon as I try to pass lists of different values to compare in my gridsearch parameters, I am getting all kind of invalid parameter error messages.
Here is my code :
First I divide my features into numerical and categorical
from sklearn.compose import make_column_selector
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.impute import KNNImputer
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
numerical_features=make_column_selector(dtype_include=np.number)
cat_features=make_column_selector(dtype_exclude=np.number)
Then I create 2 different preprocessing pipelines for numerical and categorical features:
numerical_pipeline= make_pipeline(KNNImputer())
cat_pipeline=make_pipeline(SimpleImputer(strategy='most_frequent'),OneHotEncoder(handle_unknown='ignore'))
I combined both into another pipeline, set my parameters, and run my GridSearchCV
code
model=make_pipeline(preprocessor, LinearRegression() )
params={
'columntransformer__numerical_pipeline__knnimputer__n_neighbors':[1,2,3,4,5,6,7]
}
grid=GridSearchCV(model, param_grid=params,scoring = 'r2',cv=10)
cv = KFold(n_splits=5)
all_accuracies = cross_val_score(grid, X, y, cv=cv,scoring='r2')
I tried different ways to declare the paramaters, but never found the proper one. I always get an "invalid parameter" error message.
Could you please help me understanding what went wrong?
Really a lot of thanks for your support, and take good care!