I am building a Table
class to make it easy to retrieve data from a database, manipulate it arbitrarily in memory, then save it back. Ideally, these tables work for the python interpreter and normal code. "Work" means I can use all standard pandas Dataframe
features, as well as all custom features from the Table
class.
Generally, the tables contain data I use for academic research or personal interest. So, the user-base is currently just me, but for portability I'm trying to write as generically as possible.
I have seen several threads (example 1, example 2) discussing whether to subclass DataFrame
, or use composition. After trying to walk through pandas's subclassing guide I decided to go for composition because pandas itself says this is easier.
The problem is, I want to be able to call any Dataframe
function, property, or attribute on a Table
, but I to do so, I have to keep track of any attribute I code into the Table
class. See below, points of interest are metadata
and __getattr__
, everything else is meant to be illustrative.
class Table(object):
metadata = ['db', 'data', 'name', 'clean', 'refresh', 'save']
def __getattr__(self, name):
if name not in Table.metadata:
return getattr(self.data, name) #self.data is the Dataframe
def __init__(self, db, name):
#set up Table specific values
def refresh(self):
#undo all changes since last save
etc...
Obviously, having to explicitly specify the Table
attributes versus the Dataframe
ones is not ideal (though--to my understanding--this is how pandas implements column names as attributes). I could write out tablename.data.foo
, but I find that unintuitive and non-pythonic. Is there a better way to achieve the same functionality?