Using a transformer (estimator) to transform the target labels in sklearn.pipeline

Question

I understand that one can chain several estimators that implement the transform method to transform X (the feature set) in sklearn.pipeline. However I have a use case where I would like also transform the target labels (like transform the labels to [1...K] instead of [0, K-1] and I would love to do that as a component in my pipeline. Is it possible to that at all using the sklearn.pipeline.?

score 28 · Answer 1 · answered Nov 08 '19 at 13:55

There is now a nicer way to do this built into scikit-learn; using a compose.TransformedTargetRegressor.

When constructing these objects you give them a regressor and a transformer. When you .fit() them they transform the targets before regressing, and when you .predict() them they transform their predicted targets back to the original space.

It's important to note that you can pass them a pipeline object, so they should interface nicely with your existing setup. For example, take the following setup where I train a ridge regression to predict 1 target given 2 features:

# Imports
import numpy as np
from sklearn import compose, linear_model, metrics, pipeline, preprocessing

# Generate some training and test features and targets
X_train = np.random.rand(200).reshape(100,2)
y_train = 1.2*X_train[:, 0]+3.4*X_train[:, 1]+5.6
X_test = np.random.rand(20).reshape(10,2)
y_test = 1.2*X_test[:, 0]+3.4*X_test[:, 1]+5.6

# Define my model and scalers
ridge = linear_model.Ridge(alpha=1e-2)
scaler = preprocessing.StandardScaler()
minmax = preprocessing.MinMaxScaler(feature_range=(-1,1))

# Construct a pipeline using these methods
pipe = pipeline.make_pipeline(scaler, ridge)

# Construct a TransformedTargetRegressor using this pipeline
# ** So far the set-up has been standard **
regr = compose.TransformedTargetRegressor(regressor=pipe, transformer=minmax)

# Fit and train the regr like you would a pipeline
regr.fit(X_train, y_train)
y_pred = regr.predict(X_test)
print("MAE: {}".format(metrics.mean_absolute_error(y_test, y_pred)))

This still isn't quite as smooth as I'd like it to be, for example you can access the regressor that contained by a TransformedTargetRegressor using .regressor_ but the coefficients stored there are untransformed. This means there are some extra hoops to jump through if you want to work your way back to the equation that generated the data.

The python library [`mlinsights`](https://github.com/sdpython/mlinsights), which extends scikit-learn, has added this functionality and called it [`TransformedTargetClassifier2`](https://github.com/sdpython/mlinsights/blob/777bbb85d0203f38aeb6bb5b90c3b87426fbf2db/mlinsights/mlmodel/target_predictors.py#L135). — Ari Cooper-Davis, Apr 21 '21 at 12:46

score 26 · Accepted Answer · answered Sep 03 '13 at 22:33

26

No, pipelines will always pass y through unchanged. Do the transformation outside the pipeline.

(This is a known design flaw in scikit-learn, but it's never been pressing enough to change or extend the API.)

answered Sep 03 '13 at 22:33

Fred Foo

355,277
75
744
836

3

Did this ever get reviewed? This would be very convenient to parallelise aspects of your pipeline where the y in a bounded hyper-parameter also. – jtromans Jun 17 '18 at 14:57
4

[fred-foo](https://stackoverflow.com/users/166749/fred-foo) I think it makes sense to cite `TransformedTargetRegressor` which was added to the API well after you initially answered this question. – mcguip Feb 10 '20 at 09:40

score 4 · Answer 3 · answered Aug 14 '19 at 11:58

4

You could add the label column to the end of the training data, then you apply your transformation and you delete that column before training your model. That's not very pro but enough.

answered Aug 14 '19 at 11:58

Amjad

78
5

Using a transformer (estimator) to transform the target labels in sklearn.pipeline

3 Answers3

Linked