1

I have a pipeline that contains several variations of the following data class, which I've tried to maximally simplify for this example:

from pathlib import Path

class Data(object):

    def __init__(self):
        self.filepath = Path("filepath")  

    def download(self, use_cache=True):
        if use_cache and self.filepath.exists():
            return self.filepath
        # Code to download 
        print(f"downloading to {self.filepath}")

I'd like to perform the download only if (a) the file doesn't exist, and (b) the user provides the arg use_cache = False. As mentioned, I have several classes with similar download methods that only vary in the # Code to download portion. I'd like to find a way to make the caching logic generic for all of these.

I was thinking about using a decorator:

from functools import wraps

def cache(filepath, use_cache):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            if filepath.exists() and use_cache:
                return filepath 
            func(*args, **kwargs)
        return wrapper
    return decorator

However, I'm not sure how to pass the filepath and use_cache arguments to the cache decorator. Am I going about this completely wrong? How else could I solve this problem?

turnerm
  • 1,381
  • 11
  • 14
  • The wrapper should consume the `use_cache` argument, and if I understand correctly, it won't need to call `download` in that case. I'd post an answer but your code is a mess. I really don't understand why you're using a class with a blank `__init__()` and methods that don't really use `self`. – wjandrea Nov 01 '21 at 22:32
  • It would help a lot if you provided a [mre]. Like, instead of downloading a file, you could try something simpler like inserting keys in a dict. – wjandrea Nov 01 '21 at 22:34
  • 1
    @wjandrea can you clarify what you mean by "The wrapper should consume the use_cache argument, and if I understand correctly, it won't need to call download in that case. "? I've fixed the typos and tried to clarify the question. I'm trying to keep the class as simple as possible for this example, in reality it's a bit more complex. – turnerm Nov 02 '21 at 06:26
  • Excellent clarification! OK, I think you already knew "it won't need to call `download`", and I was confused about the `use_cache` argument because of the way you wrote it before but now I see what you're trying to do. I'll write an answer, but two things to clarify still: 1) `Path` is unused; maybe you mean `self.filepath = Path("filepath")`? 2) The title is vague; I'm thinking it'd be clearer to put something like "How do I write a decorator to retrieve cached files?" Keep in mind a good title will attract better answers :) – wjandrea Nov 02 '21 at 16:40
  • (1) Yes sorry that was a typo (2) Good point, I've updated the title. Thanks! – turnerm Nov 08 '21 at 12:08

1 Answers1

1

wrapper() needs to intercept the arguments self and use_cache. Then, if I understand correctly, it doesn't need to pass on use_cache to download().

As well, your decorator seems to be one level too deep. Really you've written a decorator factory, which doesn't seem necessary here.

from functools import wraps
from pathlib import Path

def filecache(func):
    @wraps(func)
    def wrapper(self, *args, use_cache=True, **kwargs):
        if self.filepath.exists() and use_cache:
            return self.filepath
        return func(self, *args, **kwargs)  # Also added return here
    return wrapper

class Data:
    def __init__(self, filepath):  # Using "filepath" parameter for this example
        self.filepath = Path(filepath)

    @filecache
    def download(self, use_cache=True):
        print(f"downloading to {str(self.filepath)!r}")
        return self.filepath  # Also added return here to mirror wrapper

Usage:

>>> root = Data('/')
>>> root.filepath.exists()
True
>>> root.download()  # Uses cache
PosixPath('/')
>>> root.download(use_cache=False)  # Doesn't use cache
downloading to '/'
PosixPath('/')
>>> dne = Data('/DNE')
>>> dne.filepath.exists()
False
>>> dne.download()  # Doesn't use cache
downloading to '/DNE'
PosixPath('/DNE')

In theory, you can then remove the parameter use_cache=True from download(), but then it's missing from help(download) and I don't know how to add it. This answer has a solution, but it relies on implementation details. One way to see the actual arguments is inspect.signature(follow_wrapped=False):

>>> inspect.signature(Data.download, follow_wrapped=False)
<Signature (self, *args, use_cache=True, **kwargs)>
wjandrea
  • 28,235
  • 9
  • 60
  • 81