0

I have installed in my system scikit-learn 1.2.1 and I would like to create custom transformers classes to use with a ColumnTransformer instance. But the problem is when I set the output as pandas dataframe I get the following error message:

But first the code.

from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer

class TestClass(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
      return self

    def fit_transform(self, X, y=None):
      self.fit(X, y)
      return self.transform(X, y)

    def transform(self, X, y=None):
      return X

ct = ColumnTransformer(transformers=[('simple_imputer', SimpleImputer(strategy='most_frequent'), ['paymentmethod']),
                                 ('test_class', TestClass(), ['dependents', 'seniorcitizen', 'partner'])])
ct.set_output(transform='pandas') # I really need 'result' as pandas dataframe
result = ct.fit_transform(X_train, y_train)

Error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[22], line 3
      1 ct = ColumnTransformer(transformers=[('simple_imputer', SimpleImputer(strategy='most_frequent'), ['paymentmethod']),
      2                                     ('test_class', TestClass(), ['dependents', 'seniorcitizen', 'partner'])])
----> 3 ct.set_output(transform='pandas')
      4 result = ct.fit_transform(X_train, y_train)

File ~/.local/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py:287, in ColumnTransformer.set_output(self, transform)
    279 transformers = (
    280     trans
    281     for _, trans, _ in chain(
   (...)
    284     if trans not in {"passthrough", "drop"}
    285 )
    286 for trans in transformers:
--> 287     _safe_set_output(trans, transform=transform)
    289 return self

File ~/.local/lib/python3.10/site-packages/sklearn/utils/_set_output.py:275, in _safe_set_output(estimator, transform)
    272     return
    274 if not hasattr(estimator, "set_output"):
--> 275     raise ValueError(
    276         f"Unable to configure output for {estimator} because `set_output` "
    277         "is not available."
    278     )
    279 return estimator.set_output(transform=transform)

ValueError: Unable to configure output for TestClass() because `set_output` is not available.

How can I fix this?

  • Might https://stackoverflow.com/questions/75026592/how-to-create-pandas-output-for-custom-transformers/75036830#75036830 help? – amiola Feb 03 '23 at 18:30

0 Answers0