I want to perform a GridSearch with my feature engineering and model hyper-parameters. I want to include in the GridSearch, whether transforming the target variable improves accuracy. I found a way to do that, by passing in the TransformedTargetRegressor
parameters to the grid search, with the parameters being a function and then a custom identity function I defined.
This predictably produces a bunch of errors though, as in completing the GridSearch, TransformedTargetRegressor
will take the combination of no-transformation (identity) and then an inverse transformation of the exponential function.
Is there a way to pass in pairs of hyper-parameters to the grid search, e.g. [(identity_, identity), (np.log1p, np.expm1)]
?
I have reviewed the scikit-learn docs and googled around, but I can't find an example of using the transformed target regressor in a grid search pipeline. Is this not common practice? Is there a reason to avoid doing this?
import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.compose import TransformedTargetRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
def identity_(x):
return x
X_train = pd.DataFrame(pd.Series([0, 1, 2, 3, 4, 5, 6]))
y_train = pd.Series([5, 6, 7, 8, 7, 8, 9])
search_params_ = {
'model__func': [np.log1p, identity_],
'model__inverse_func': [np.expm1, identity_]
}
pipeline_ = Pipeline([
('scaler', StandardScaler()),
('model', TransformedTargetRegressor(regressor=LinearRegression(), check_inverse=False))
])
reg_search_cv = GridSearchCV(pipeline_, search_params_)
reg_search_cv.fit(X_train, y_train)