Suppose you have a large saved dataset that takes significant time to load and calculate different aspects of the dataset, and you write a python class to load and expose the many attributes of your data to the rest of your python code. Is there a pythonic way of exposing these attributes in a way that loads/calculates them only when required, and such that multiple requests for the data will not load/calculate them again?
Alternatively, is there a paradigm I'm missing that would remove the need for this? Like using a method to calculate/load big data, and just making sure I reuse the calculated data instead of recalculating it?
This is what I've thought of so far, but it feels like I'm missing something.
def Dataset(object):
def __init__(self, dataset_path):
self.dataset_path = dataset_path
@property
def foo(self):
try:
return self._foo
except AttributeError:
self._foo = self.load_foo(self.dataset_path)
return self._foo
@property
def bar(self):
try:
return self._bar
except AttributeError:
self._bar= self.load_bar(self.dataset_path)
return self._bar
def load_foo(self, dataset_path)
# do things that take a lot of time
return foo
def load_bar(self, dataset_path)
# do things that take a lot of time
return bar
if __name__ == '__main__':
dataset = Dataset('path/to/dataset')
# use dataset information
x = dataset.foo * 5
y = dataset.foo + 3
The advantage of this class is that I do not have to keep track of whether I have previously accessed dataset.foo, additionally, if I don't end up using dataset.bar for this particular object, the loading time isn't wasted as it would if I were to load bar in the init function.
As I use more and more variables, this becomes very ugly looking. Is there a more pythonic way to do this? or should I even be doing this in the first place?