1

I created a pass-through wrapper class around an existing class from sklearn and it does not behave as expected:

import pandas as pd
from sklearn.preprocessing import OrdinalEncoder

tiny_df = pd.DataFrame({'x': ['a', 'b']})

class Foo(OrdinalEncoder):

    def __init__(self, *args, **kwargs):
        super().__init__(self, *args, **kwargs)

    def fit(self, X, y=None):
        super().fit(X, y)
        return self


oe = OrdinalEncoder()
oe.fit(tiny_df) # works fine
foo = Foo()
foo.fit(tiny_df) # fails

The relevant part of the error message I receive is:

~\.conda\envs\pytorch\lib\site-packages\sklearn\preprocessing\_encoders.py in _fit(self, X, handle_unknown)
     69                         raise ValueError("Unsorted categories are not "
     70                                          "supported for numerical categories")
---> 71             if len(self._categories) != n_features:
     72                 raise ValueError("Shape mismatch: if n_values is an array,"
     73                                  " it has to be of shape (n_features,).")

TypeError: object of type 'Foo' has no len()

Somehow parent's private property _categories does not seem to get set, even though I've called the parent constructor in the __init__() method of my class. I must be missing something simple here, and would appreciate any help!

hpaulj
  • 221,503
  • 14
  • 230
  • 353
kgolyaev
  • 565
  • 2
  • 10
  • do you also need to pass *args to the constructor? – OregonTrail Oct 12 '19 at 06:28
  • I added `*args` in calls to both `__init__()` methods, and that did not change the error I'm getting. – kgolyaev Oct 12 '19 at 06:33
  • There is a `__len__` magicmethod that you can implement. – OregonTrail Oct 12 '19 at 06:37
  • I doubt this will solve the problem - I have no idea how `__len__` is implemented in the parent class, and the whole idea of creating a child class is that I wouldn't have to - all the parent methods should _just work_. No? – kgolyaev Oct 12 '19 at 06:44
  • 3
    The error message implies that `self._categories` is an object of type `Foo`. I think the problem is with the `self` in the `__init__`. Change it to `super().__init__(*args, **kwargs)`. The `self` is already included in the `super()`. https://stackoverflow.com/questions/222877/what-does-super-do-in-python – hpaulj Oct 12 '19 at 07:19

1 Answers1

3

You don't have to pass self again to the super function. And scikit-learn's estimators should always specify their parameters in the signature of their __init__ and no varargs are allowed else you will get a RUNTIMEERROR, so you have to remove it. I have modified your code as below:

import pandas as pd
from sklearn.preprocessing import OrdinalEncoder

tiny_df = pd.DataFrame({'x': ['a', 'b']})

class Foo(OrdinalEncoder):

    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def fit(self, X, y=None):
        super().fit(X, y)
        return self


oe = OrdinalEncoder()
oe.fit(tiny_df) # works fine
foo = Foo()
foo.fit(tiny_df) # works fine too

SAMPLE OUTPUT

foo.transform(tiny_df)
array([[0.],
       [1.]])

A little extra

class Foo(OrdinalEncoder):

    def __init__(self, *args, **kwargs):
        super().__init__(*args,**kwargs)

    def fit(self, X, y=None):
        super().fit(X, y)
        return self

And when you create Foo:

foo= Foo()

RuntimeError: scikit-learn estimators should always specify their parameters in the signature of their __init__ (no varargs). <class '__main__.Foo'> with constructor (self, *args, **kwargs) doesn't  follow this convention.
Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
Parthasarathy Subburaj
  • 4,106
  • 2
  • 10
  • 24