I'm trying to create a sub class of DataFrame, that extends it with few properties and methods. In addition to the default constructor there are few others like the one below that initialized the DataFrame from SQL table and then add few attributes (I simplified it and left a dummy just to demonstrate the problem). So once I get the initial df, I "convert" it to my class by df.__class__ = Cls
statement. It seems somewhat weired to me, but reading few posts on this issue (e.g. Reclassing an instance in Python) it's a valid way to go, and seems to work most of the time. But the problem is when I use a method of the parent class (in that case DataFrame.append) that returns a new instance of the object: sdf2 = sdf1.append(item)
- the resulting class of sdf2 is the DataFrame and not SubDataFrame, and consequently print('sdf2: ', sdf2.name)
fails because 'DataFrame' has no attribute 'name'... the bottom line, trying naively to use a standard DataFrame method, my object was corrupted... I can solve it by writing the (virtual) 'append' method in my subclass, but in that case I would need to do it for many methods and if I cannot use the inherited methods no sense in subclassing at all (I can just define the DataFrame as a member variable of my class).
I guess there should be the best practice for this sub-classing, just don't know it. Any help is very appreciated.
Thanks!
Adi
import pandas as pd
import pandas.io.sql as pdsql
class SubDataFrame(pd.DataFrame):
@classmethod
def create(Cls):
# df = pdsql.read_sql(db_query, db_connection)
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d, index=['a', 'b', 'c', 'd'])
df.__class__ = Cls
df.name = 'Test Obj'
return df
if __name__ == "__main__":
sdf1 = SubDataFrame.create()
print('sdf1: ', sdf1.__class__) # prints sdf1: <class '__main__.SubDataFrame'>"
print('sdf1: ', sdf1.name) # prints "sdf1: Test Obj"
item = sdf1.iloc[0].copy()
sdf2 = sdf1.append(item)
print('sdf2: ', sdf2.__class__) # prints: "sdf2: <class 'pandas.core.frame.DataFrame'>"
print('sdf2: ', sdf2.name) # exception: "AttributeError: 'DataFrame' object has no attribute 'name'"
pass
Try to test using super() as suggested by @BrenB. I read the reference (regarding unbound superclass classmethod) but still can't make it work... these are my tests:
import pandas as pd
import pandas.io.sql as pdsql
class SubDataFrame(pd.DataFrame):
@classmethod
def create_reset_class(Cls):
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d, index=['a', 'b', 'c', 'd'])
df.__class__ = Cls
df.name = 'Test Obj'
return df
@classmethod
def create_using_super(Cls):
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = super(SubDataFrame, Cls).__init__(d, index=['a', 'b', 'c', 'd'])
df.name = 'Test Obj'
return df
def __init__(self):
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
df = super(SubDataFrame, self).__init__(d, index=['a', 'b', 'c', 'd'])
df.name = 'Test Obj'
return df
if __name__ == "__main__":
sdf3 = SubDataFrame.create_using_super()
sdf4 = SubDataFrame()
sdf1 = SubDataFrame.create_reset_class()
print('sdf1: ', sdf1.__class__)
print('sdf1: ', sdf1.name)
item = sdf1.iloc[0].copy()
sdf2 = sdf1.append(item)
print('sdf2: ', sdf2.__class__)
print('sdf2: ', sdf2.name)
pass
Note that for my SubDataFrame I have a default __init__
constructor, create()
is my (non-default) constructor which is a classmethod, while inside it I call pandas.DataFrame()
which is the standard bound constructor, expecting self and not Cls. So I tried 2 options:
a. df = super(SubDataFrame, Cls).__init__(d, index=['a', 'b', 'c', 'd'])
generates an error AttributeError in File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 208: 'dict' object has no attribute '_init_dict'
b. Using a standard bound constructor __init__
doesn't generate any error but df returns as None (from df = super(SubDataFrame, self).__init__(d, index=['a', 'b', 'c', 'd'])
Do I use super() incorrectly? is it a pandas bug? any other idea? thanks!