This question is based off of this question regarding lazy attributes for python classes.
I really like the solution given there:
Here is an example implementation of a lazy property decorator:
import functools
def lazyprop(fn):
attr_name = '_lazy_' + fn.__name__
@property
@functools.wraps(fn)
def _lazyprop(self):
if not hasattr(self, attr_name):
setattr(self, attr_name, fn(self))
return getattr(self, attr_name)
return _lazyprop
class Test(object):
@lazyprop
def a(self):
print 'generating "a"'
return range(5)
Interactive session:
>>> t = Test()
>>> t.__dict__
{}
>>> t.a
generating "a"
[0, 1, 2, 3, 4]
>>> t.__dict__
{'_lazy_a': [0, 1, 2, 3, 4]}
>>> t.a
[0, 1, 2, 3, 4]
This solution allows you to create a @lazyprop
for any attribute. However, you must write a method for each attribute that you wish to be lazy. I need something that will work for attributes whose names I won't know ahead of time (of which there may be many).
These attributes are DataFrames read in from hdf5 files. Each file contains many different tables, the names of which I won't know. I have an excellent function, get_all_table_names(filename)
that returns the names of all the tables in the file. Currently, I loop through all the names, and read them in one after another. There are however, several tens of GB of data, which take several minutes to read in.
Is there a way to only actually read in a table when a method calls that table? The example given here is perfect, except that I need to know the name of the table ahead of time.
EDIT
The code to load data from an HDF5 file to a Pandas DataFrame looks like the following.
df = read_to_pandas(directory_of_files, 'table_name', number_of_files_to_read)