"ValueError: A given column is not a column of the dataframe" Transforming the target variable on a pipeline

Question

if someone could give me a hand with this Pipeline, it works perfectly if I remove the target transformation, but I must transform it to adjust the distribution.

The goal of transforming it via pipeline is to get the predictions with the original quantity measure.

numeric_features = [col for col in x_train.select_dtypes(np.number)]
categorical_features = [col for col in x_train.select_dtypes(object)]

target = ['msrp']

target_trans = make_pipeline(
                PowerTransformer())

numeric_transformer = make_pipeline(
    IterativeImputer(),
    StandardScaler()
)

categorical_transformer = make_pipeline(
    SimpleImputer(strategy="most_frequent"),
    OneHotEncoder(handle_unknown='ignore', sparse=False)
)

col_transformer = make_column_transformer(
    (target_trans, target), 
    (categorical_transformer, categorical_features),
    (numeric_transformer, numeric_features),
    verbose=1)


params = {
        'xgb__learning_rate' : np.linspace(0,1,10), 'xgb__max_depth': range(1,30,2), 'xgb__colsample_bytree': np.linspace(0,1,10), 'xgb__n_estimators': range(1, 300, 50),
        'feature_selection__k': range(1, 1063, 100)}


grid_search = RandomizedSearchCV(model, params, cv=2, n_jobs=10, refit=True, verbose=1)


pipe = Pipeline(steps=[
    ('preprocess', col_transformer),
    ('grid_search', grid_search)])

And this is the error message:

KeyError                                  Traceback (most recent call last)
File /opt/homebrew/lib/python3.10/site-packages/pandas/core/indexes/base.py:3800, in Index.get_loc(self, key, method, tolerance)
   3799 try:
-> 3800     return self._engine.get_loc(casted_key)
   3801 except KeyError as err:

File /opt/homebrew/lib/python3.10/site-packages/pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()

File /opt/homebrew/lib/python3.10/site-packages/pandas/_libs/index.pyx:165, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:5745, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:5753, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'msrp'

ValueError: A given column is not a column of the dataframe

Does this answer your question? [Using a transformer (estimator) to transform the target labels in sklearn.pipeline](https://stackoverflow.com/questions/18602489/using-a-transformer-estimator-to-transform-the-target-labels-in-sklearn-pipeli) — dx2-66, Sep 22 '22 at 07:59

"ValueError: A given column is not a column of the dataframe" Transforming the target variable on a pipeline

0 Answers0