1

I want to add target variable transformers into my sklearn pipeline. Usually for operations like PCA or any kind of regressors-classifiers, sklearn supports parameter grids for CV like:

        param_grid = [{
            "pca__n_components": [5, 10, 25, 50, 125, 250, 625, 1500, 3000],
            "rdf__n_estimators": n_estimators,
            "rdf__bootstrap": bootstrap,
            "rdf__max_depth": max_depth,
            "rdf__class_weight": class_weight}]

Is it possible to add variable transformers to this grid, too? For example, I want to train my regressor first without transforming the target variable and then using PowerTransformer(), I want to scale my target variable and want to see if it improves my results. Is it possible to integrate these into the parameter grid, too?

Ufuk Can Bicici
  • 3,589
  • 4
  • 28
  • 57

1 Answers1

2

Yes, it is possible to integrate different transformers into your param_grid dictionary:

from sklearn.datasets import make_classification
from sklearn.preprocessing import PowerTransformer
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC

X, y = make_classification(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y,random_state=0)
pipe = Pipeline([('transformer', PowerTransformer()), ('svc', SVC())])

param_grid  = {"svc__C":[1, 10], "transformer":[PowerTransformer(), StandardScaler()]}

clf = GridSearchCV(pipe, param_grid )
clf.fit(X_train, y_train)

print(clf.best_params_)
Kim Tang
  • 2,330
  • 2
  • 9
  • 34
  • Nice, I was thinking on something on the lines of this. Do you know if you ca avoid adding the transformer (or any step) in the Pipeline but still add it in the grid? – yatu Sep 08 '20 at 13:21
  • No, because you can only specify your estimator and the parameters for it in the gridSearchCV method, so no additional transformer. Therefore you have to add the transformer to your pipeline, which you provide as an estimator for the gridSearchCV, so that you can then access the transformer via the pipeline parameters. Have a look at the documentation here: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html – Kim Tang Sep 08 '20 at 13:26
  • Thanks for the answer. In my case, I need to transform the target variables; y; so is it still possible to integrate it into the pipeline? Transformers from the class TransforMixin only seems to work on X, not y. – Ufuk Can Bicici Sep 08 '20 at 13:26
  • Yes, what I though. Good to know for sure now :) – yatu Sep 08 '20 at 13:43
  • Yes, that should be possible as well. Sklearn has several transformers for the targets. Have a look here: https://scikit-learn.org/stable/modules/preprocessing_targets.html or also at the thread here https://stackoverflow.com/questions/18602489/using-a-transformer-estimator-to-transform-the-target-labels-in-sklearn-pipeli, where they refer to the TransformedTargetRegressor. If you don't find a solution with those, just build a loop around your GridSearchCV and transform your 'y' outside of it. – Kim Tang Sep 08 '20 at 13:56