Reclassing a Python DataFrame by setting class

Question

I'm trying to create a sub class of DataFrame, that extends it with few properties and methods. In addition to the default constructor there are few others like the one below that initialized the DataFrame from SQL table and then add few attributes (I simplified it and left a dummy just to demonstrate the problem). So once I get the initial df, I "convert" it to my class by df.__class__ = Cls statement. It seems somewhat weired to me, but reading few posts on this issue (e.g. Reclassing an instance in Python) it's a valid way to go, and seems to work most of the time. But the problem is when I use a method of the parent class (in that case DataFrame.append) that returns a new instance of the object: sdf2 = sdf1.append(item) - the resulting class of sdf2 is the DataFrame and not SubDataFrame, and consequently print('sdf2: ', sdf2.name) fails because 'DataFrame' has no attribute 'name'... the bottom line, trying naively to use a standard DataFrame method, my object was corrupted... I can solve it by writing the (virtual) 'append' method in my subclass, but in that case I would need to do it for many methods and if I cannot use the inherited methods no sense in subclassing at all (I can just define the DataFrame as a member variable of my class). I guess there should be the best practice for this sub-classing, just don't know it. Any help is very appreciated. Thanks!

Adi

import pandas as pd
import pandas.io.sql as pdsql

class SubDataFrame(pd.DataFrame):

    @classmethod
    def create(Cls):

        # df = pdsql.read_sql(db_query, db_connection)
        d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
        df = pd.DataFrame(d, index=['a', 'b', 'c', 'd'])
        df.__class__ = Cls
        df.name = 'Test Obj'
        return df


if __name__ == "__main__":

    sdf1 = SubDataFrame.create()
    print('sdf1: ', sdf1.__class__)   # prints sdf1:  <class '__main__.SubDataFrame'>"
    print('sdf1: ', sdf1.name)        # prints "sdf1:  Test Obj"

    item = sdf1.iloc[0].copy()
    sdf2 = sdf1.append(item)
    print('sdf2: ', sdf2.__class__)   # prints: "sdf2:  <class 'pandas.core.frame.DataFrame'>"
    print('sdf2: ', sdf2.name)  # exception: "AttributeError: 'DataFrame' object has no attribute 'name'"
    pass

Try to test using super() as suggested by @BrenB. I read the reference (regarding unbound superclass classmethod) but still can't make it work... these are my tests:

import pandas as pd
import pandas.io.sql as pdsql

class SubDataFrame(pd.DataFrame):

    @classmethod
    def create_reset_class(Cls):

        d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
        df = pd.DataFrame(d, index=['a', 'b', 'c', 'd'])
        df.__class__ = Cls
        df.name = 'Test Obj'
        return df

    @classmethod
    def create_using_super(Cls):

        d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
        df = super(SubDataFrame, Cls).__init__(d, index=['a', 'b', 'c', 'd'])
        df.name = 'Test Obj'
        return df

    def __init__(self):

        d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
        df = super(SubDataFrame, self).__init__(d, index=['a', 'b', 'c', 'd'])
        df.name = 'Test Obj'
        return df

if __name__ == "__main__":

    sdf3 = SubDataFrame.create_using_super()
    sdf4 = SubDataFrame()

    sdf1 = SubDataFrame.create_reset_class()
    print('sdf1: ', sdf1.__class__)
    print('sdf1: ', sdf1.name)

    item = sdf1.iloc[0].copy()
    sdf2 = sdf1.append(item)
    print('sdf2: ', sdf2.__class__)
    print('sdf2: ', sdf2.name)
    pass

Note that for my SubDataFrame I have a default __init__ constructor, create() is my (non-default) constructor which is a classmethod, while inside it I call pandas.DataFrame() which is the standard bound constructor, expecting self and not Cls. So I tried 2 options:

a. df = super(SubDataFrame, Cls).__init__(d, index=['a', 'b', 'c', 'd']) generates an error AttributeError in File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 208: 'dict' object has no attribute '_init_dict'

b. Using a standard bound constructor __init__ doesn't generate any error but df returns as None (from df = super(SubDataFrame, self).__init__(d, index=['a', 'b', 'c', 'd'])

Do I use super() incorrectly? is it a pandas bug? any other idea? thanks!

Why are you setting `__class__` that way at all? You should be able to just use `super` to do whatever initialization you need to do. — BrenBarn, Jan 29 '15 at 21:14
@BrenBarn -- IIRC, pandas dataframes are subclasses of `numpy.ndarray` and the latter is known to be hard to subclass properly. I'd imagine those difficulties would carry over into dataframes as well. I think that a (potentially) easier route here is to create a new type by composition rather than traditional inheritance. — mgilson, Jan 29 '15 at 21:26
@mgilson: I believe DataFrame is no longer a subclass of ndarray in recent pandas versions. There were some changes a few versions ago that made subclassing DataFrame smoother (see [this issue](https://github.com/pydata/pandas/issues/60)). — BrenBarn, Jan 30 '15 at 03:17
@BrenBarn: if I understand correctly super() requires 'self' and is relevant for instance methods (e.g. __init__) but not for class methods like this one, isn't it? In any case I tried various options for using super: — Adi E, Jan 31 '15 at 09:55
@BrenBarn: Sorry, my previous comment was truncated. If I understand correctly, super() requires 'self' and is relevant for instance methods (e.g. __init__) but not for class methods like this one, isn't it? In any case I tried various options for using super: `df = super(SubDataFrame).__init__(d, index=['a', 'b', 'c', 'd'])` or `df = super().__init__(d, index=['a', 'b', 'c', 'd'])` and few others and all ended with exceptions (I'm using Python 3.4 and Pandas 0.15.2). Probably I miss something, can you suggest the correct way for doing it with super()? thanks! — Adi E, Jan 31 '15 at 10:04
@AdiE: You can use `super(SubDataFrame, cls)` to get an unbound superclass classmethod. See [this question](http://stackoverflow.com/questions/1817183/using-super-with-a-class-method). — BrenBarn, Jan 31 '15 at 18:10

Reclassing a Python DataFrame by setting __class__

0 Answers0

Reclassing a Python DataFrame by setting class