The complete answer shows two equivalent ways of solving this. Skip to the TL;DR for the solution.
Step 1: Setup the problem
Let's replace the target_logtransform
and target_inverselog
functions. scikit-learn
has built-in methods for both:
import numpy as np
from sklearn.preprocessing import FunctionTransformer
from sklearn.preprocessing import MinMaxScaler
log_transformer = FunctionTransformer(func=np.log1p, inverse_func=np.expm1)
scaler = MinMaxScaler()
We could do this is by manually rescaling our target y
.
We'll do this as a sanity check to make sure we're correct later:
# Initialize some data for reproducing results:
X = np.array([-0.916,-0.916,-0.836,-0.768,-0.700,-0.608,-0.528,-0.472,-0.404,-0.300,-0.184,-0.0840,0.0480,0.168,0.328,0.468,0.640,0.760,0.872]).reshape(-1, 1)
y = np.array([0.899,0.899,0.895,0.871,0.827,0.747,0.607,0.479,0.339,0.167,0.00294,-0.0971,-0.181,-0.233,-0.309,-0.333,-0.333,-0.321,-0.301]).reshape(-1, 1)
# Manually do this in two steps
y_log = log_transformer.fit_transform(y)
y_log_scaled = scaler.fit_transform(y_log)
print(y_log_scaled)
Output:
array([[1. ],
[1. ],
[0.9979847 ],
[0.98580284],
...
[0.03378575],
[0. ],
[0. ],
[0.01704215],
[0.04478737]]
Step 2: Defining TwoTransformers
to log then scale
Let's define a TwoTransformers
class extending scikit-learn
's TransformerMixin
and BaseEstimator
classes, and implement this object's fit_transform
and inverse_transform
methods. The first will look similar to our manual approach, but we can easily define the operations for the inverse:
from sklearn.base import TransformerMixin, BaseEstimator
class TwoTransformers(TransformerMixin, BaseEstimator):
def fit_transform(self, y):
self.log_transformer = FunctionTransformer(
func=np.log1p,
inverse_func=np.expm1,
)
self.scaler = MinMaxScaler()
y_log = self.log_transformer.fit_transform(y)
y_log_scaled = self.scaler.fit_transform(y_log)
return y_log_scaled
def inverse_transform(self, y):
y_unscaled = self.scaler.inverse_transform(y)
y_unscaled_unlog = self.log_transformer.inverse_transform(y_unscaled)
return y_unscaled_unlog
And we can show it is equivalent to our earlier results:
two_steps = TwoTransformers()
print(np.all(y_log_scaled == two_steps.fit_transform(y)))
print(two_steps.fit_transform(y))
Output matches:
True
[[1. ]
[1. ]
[0.9979847 ]
[0.98580284]
...
[0.03378575]
[0. ]
[0. ]
[0.01704215]
[0.04478737]]
Step 3: Integrating with TransformedTargetRegressor
Let's demo with LinearRegression
(ignore cross validation for now) to make sure things are working:
from sklearn.linear_model import LinearRegression
from sklearn.compose import TransformedTargetRegressor
two_steps = TwoTransformers()
regr_trans = TransformedTargetRegressor(
regressor=LinearRegression(),
func=two_steps.fit_transform,
inverse_func=two_steps.inverse_transform,
)
regr_trans.fit(X, y)
y_pred_two_step = regr_trans.predict(X)
For comparison, here is an equivalent version where we use our y_log_scaled
variable from earlier, fit our model, then manually undo our operations:
clf = LinearRegression()
clf.fit(X, y_log_scaled)
# Undo the scaling
y_pred = clf.predict(X)
y_pred_unscale = scaler.inverse_transform(y_pred)
y_pred_unscale_unlog = log_transformer.inverse_transform(y_pred_unscale)
Again, we can show that both approaches get the same result:
print(np.all(y_pred_two_step == y_pred_unscale_unlog))
print(np.c_[y_pred_two_step, y_pred_unscale_unlog])
Output:
True
[[ 0.9212786 0.9212786 ]
[ 0.9212786 0.9212786 ]
[ 0.81566306 0.81566306]
...
[-0.36027418 -0.36027418]
[-0.41229248 -0.41229248]
[-0.45701964 -0.45701964]]
TL;DR: Final Code
Define a class with fit_transform
and inverse_transform
, pass an instance to TransformedTargetRegressor
:
import numpy as np
from sklearn.base import TransformerMixin, BaseEstimator
from sklearn.preprocessing import FunctionTransformer
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.compose import TransformedTargetRegressor
X = np.array([-0.916,-0.916,-0.836,-0.768,-0.700,-0.608,-0.528,-0.472,-0.404,-0.300,-0.184,-0.0840,0.0480,0.168,0.328,0.468,0.640,0.760,0.872]).reshape(-1, 1)
y = np.array([0.899,0.899,0.895,0.871,0.827,0.747,0.607,0.479,0.339,0.167,0.00294,-0.0971,-0.181,-0.233,-0.309,-0.333,-0.333,-0.321,-0.301]).reshape(-1, 1)
class TwoTransformers(TransformerMixin, BaseEstimator):
def fit_transform(self, y):
self.log_transformer = FunctionTransformer(
func=np.log1p,
inverse_func=np.expm1,
)
self.scaler = MinMaxScaler()
y_log = self.log_transformer.fit_transform(y)
y_log_scaled = self.scaler.fit_transform(y_log)
return y_log_scaled
def inverse_transform(self, y):
y_unscaled = self.scaler.inverse_transform(y)
y_unscaled_unlog = self.log_transformer.inverse_transform(y_unscaled)
return y_unscaled_unlog
two_steps = TwoTransformers()
regr_trans = TransformedTargetRegressor(
regressor=LinearRegression(),
func=two_steps.fit_transform,
inverse_func=two_steps.inverse_transform,
)
regr_trans.fit(X, y)
y_pred_two_step = regr_trans.predict(X)
print(y_pred_two_step)