if someone could give me a hand with this Pipeline, it works perfectly if I remove the target transformation, but I must transform it to adjust the distribution.
The goal of transforming it via pipeline is to get the predictions with the original quantity measure.
numeric_features = [col for col in x_train.select_dtypes(np.number)]
categorical_features = [col for col in x_train.select_dtypes(object)]
target = ['msrp']
target_trans = make_pipeline(
PowerTransformer())
numeric_transformer = make_pipeline(
IterativeImputer(),
StandardScaler()
)
categorical_transformer = make_pipeline(
SimpleImputer(strategy="most_frequent"),
OneHotEncoder(handle_unknown='ignore', sparse=False)
)
col_transformer = make_column_transformer(
(target_trans, target),
(categorical_transformer, categorical_features),
(numeric_transformer, numeric_features),
verbose=1)
params = {
'xgb__learning_rate' : np.linspace(0,1,10), 'xgb__max_depth': range(1,30,2), 'xgb__colsample_bytree': np.linspace(0,1,10), 'xgb__n_estimators': range(1, 300, 50),
'feature_selection__k': range(1, 1063, 100)}
grid_search = RandomizedSearchCV(model, params, cv=2, n_jobs=10, refit=True, verbose=1)
pipe = Pipeline(steps=[
('preprocess', col_transformer),
('grid_search', grid_search)])
And this is the error message:
KeyError Traceback (most recent call last)
File /opt/homebrew/lib/python3.10/site-packages/pandas/core/indexes/base.py:3800, in Index.get_loc(self, key, method, tolerance)
3799 try:
-> 3800 return self._engine.get_loc(casted_key)
3801 except KeyError as err:
File /opt/homebrew/lib/python3.10/site-packages/pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()
File /opt/homebrew/lib/python3.10/site-packages/pandas/_libs/index.pyx:165, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:5745, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:5753, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'msrp'
ValueError: A given column is not a column of the dataframe