1

Suppose you have a large saved dataset that takes significant time to load and calculate different aspects of the dataset, and you write a python class to load and expose the many attributes of your data to the rest of your python code. Is there a pythonic way of exposing these attributes in a way that loads/calculates them only when required, and such that multiple requests for the data will not load/calculate them again?

Alternatively, is there a paradigm I'm missing that would remove the need for this? Like using a method to calculate/load big data, and just making sure I reuse the calculated data instead of recalculating it?

This is what I've thought of so far, but it feels like I'm missing something.

def Dataset(object):
    def __init__(self, dataset_path):
        self.dataset_path = dataset_path

    @property
    def foo(self):
        try:
            return self._foo
        except AttributeError:
            self._foo = self.load_foo(self.dataset_path)
            return self._foo

    @property
    def bar(self):
        try:
            return self._bar
        except AttributeError:
            self._bar= self.load_bar(self.dataset_path)
            return self._bar

    def load_foo(self, dataset_path)
        # do things that take a lot of time
        return foo

    def load_bar(self, dataset_path)
        # do things that take a lot of time
        return bar

if __name__ == '__main__':
    dataset = Dataset('path/to/dataset')

    # use dataset information
    x = dataset.foo * 5
    y = dataset.foo + 3

The advantage of this class is that I do not have to keep track of whether I have previously accessed dataset.foo, additionally, if I don't end up using dataset.bar for this particular object, the loading time isn't wasted as it would if I were to load bar in the init function.

As I use more and more variables, this becomes very ugly looking. Is there a more pythonic way to do this? or should I even be doing this in the first place?

lachness
  • 11
  • 1

1 Answers1

2

What you could do if there is nothing else you are going to use this class for is this.

class Dataset(object):
    def __init__(self, dataset_path):
        self.dataset_path = dataset_path

    def __getattr__(self, item):
        try:
            super().__getattribute__(item)
        except AttributeError:
            self.__setattr__(item, self.load_named_attribute(self.dataset_path, item))
            return super().__getattribute__(item)

    def load_named_attribute(self, item_name):
        # this function should load the data based on the input name for now just return 5
        itemdata = 5
        return itemdata

a = Dataset('dummystr')
print(a.dummyvar)

Which changes the standard getattr method to find it with the original getattribute method or if not set, tries to find it in your datafilepath.

Marc
  • 1,539
  • 8
  • 14