1

I am using the python library diskcache and its decorater @cache.memoize to cache calls to my couchdb database. Works fine. However, I would like to print to the user whether the data is returned from the database or from the cache.

I don't even know how to approach this problem.

My code so far:

import couchdb
from diskcache import Cache

cache = Cache("couch_cache")


@cache.memoize()
def fetch_doc(url: str, database: str, doc_id: str) -> dict:

    server = couchdb.Server(url=url)
    db = server[database]

    return dict(db[doc_id])
martineau
  • 119,623
  • 25
  • 170
  • 301
Leevi L
  • 1,538
  • 2
  • 13
  • 28
  • It puzzles me why you would want to slow down something meant to speed up execution of the function. Regardless, to do what you want with one will require you to write your own decorator. Suggest you start by looking at the source code for `diskcache.Cache.memoize()` (which is pure-Python). – martineau Feb 22 '21 at 22:37

1 Answers1

2

Here's a way but I don't really recommend it because (1) it adds an extra operation of checking the cache manually yourself, and (2) it probably duplicates what the library is already doing internally. I don't have proper checking for any performance impact since I don't have a production data/env with varied doc_ids, but as martineau's comment says, it could slow things down because of an extra lookup operation.

But here it goes.

The diskcache.Cache object "supports a familiar Python mapping interface" (like dicts). You can then manually check for yourself if a given key is already present in the cache, using the same key automatically generated based on the arguments to the memoize-d function:

An additional __cache_key__ attribute can be used to generate the cache key used for the given arguments.

>>> key = fibonacci.__cache_key__(100)  
>>> print(cache[key])  
>>> 354224848179261915075    

So, you can wrap your fetch_doc function into another function, that checks if a cache key based on the url, database, and doc_id arguments exists, prints the result to the user, all before calling the actual fetch_doc function:

import couchdb
from diskcache import Cache

cache = Cache("couch_cache")

@cache.memoize()
def fetch_doc(url: str, database: str, doc_id: str) -> dict:
    server = couchdb.Server(url=url)
    db = server[database]
    return dict(db[doc_id])

def fetch_doc_with_logging(url: str, database: str, doc_id: str):
    # Generate the key
    key = fetch_doc.__cache_key__(url, database, doc_id)

    # Print out whether getting from cache or not
    if key in cache:
        print(f'Getting {doc_id} from cache!')
    else:
        print(f'Getting {doc_id} from DB!')

    # Call the actual memoize-d function
    return fetch_doc(url, database, doc_id)

When testing that out with:

url = 'https://your.couchdb.instance'
database = 'test'
doc_id = 'c97bbe3127fb6b89779c86da7b000885'

cache.stats(enable=True, reset=True)
for _ in range(5):
    fetch_doc_with_logging(url, database, doc_id)
print(f'(hits, misses) = {cache.stats()}')

# Only for testing, so 1st call will always miss and will get from DB
cache.clear()

It outputs:

$ python test.py 
Getting c97bbe3127fb6b89779c86da7b000885 from DB!
Getting c97bbe3127fb6b89779c86da7b000885 from cache!
Getting c97bbe3127fb6b89779c86da7b000885 from cache!
Getting c97bbe3127fb6b89779c86da7b000885 from cache!
Getting c97bbe3127fb6b89779c86da7b000885 from cache!
(hits, misses) = (4, 1)

You can turn that wrapper function into a decorator:

def log_if_cache_or_not(memoized_func):
    def _wrap(*args):
        key = memoized_func.__cache_key__(*args)
        if key in cache:
            print(f'Getting {doc_id} from cache!')
        else:
            print(f'Getting {doc_id} from DB!')
        return memoized_func(*args)

    return _wrap

@log_if_cache_or_not
@cache.memoize()
def fetch_doc(url: str, database: str, doc_id: str) -> dict:
    server = couchdb.Server(url=url)
    db = server[database]
    return dict(db[doc_id])

for _ in range(5):
    fetch_doc(url, database, doc_id)

Or as suggested in the comments, combine it into 1 new decorator:

def memoize_with_logging(func):
    memoized_func = cache.memoize()(func)

    def _wrap(*args):
        key = memoized_func.__cache_key__(*args)
        if key in cache:
            print(f'Getting {doc_id} from cache!')
        else:
            print(f'Getting {doc_id} from DB!')
        return memoized_func(*args)

    return _wrap

@memoize_with_logging
def fetch_doc(url: str, database: str, doc_id: str) -> dict:
    server = couchdb.Server(url=url)
    db = server[database]
    return dict(db[doc_id])

for _ in range(5):
    fetch_doc(url, database, doc_id)

Some quick testing:

In [9]: %timeit for _ in range(100000): fetch_doc(url, database, doc_id)
13.7 s ± 112 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [10]: %timeit for _ in range(100000): fetch_doc_with_logging(url, database, doc_id)
21.2 s ± 637 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

(It would be probably better if the doc_id is varied randomly in calls)

Again, as I mentioned at the start, caching and memoize-ing the function call is supposed to speed-up that function. This answer adds additional operations of cache lookup and printing/logging whether or not you are fetching from DB or from cache, and it could impact the performance of that function call. Test appropriately.

Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
  • 1
    Your decorator can get as input normal functions and pass them itself through `cache.memoize()` so you wouldn't need 2 decorators. Something like `def my_memoize(func): memoized_func = cache.memoize()(func)` – Roy Cohen Feb 23 '21 at 12:25
  • @RoyCohen Ahh, yes that's also possible. Though that would combine both logging and memoization into one function, and I recommend making 1 function just do 1 thing. Also, with logging as a separate decorator, it makes it reusable and easy to plug-in/out. – Gino Mempin Feb 23 '21 at 12:29
  • But the `log_if_cache_or_not` requires its parameter to memoized, so it makes sence memoize the function inside it. If you're concerned with usablity you can make an optional argument to turn it off. – Roy Cohen Feb 23 '21 at 15:10
  • Another idea I just thought of is to memoize the function only if it's not already memoized. I don't know if there's a better way but I think `hasattr(func, '__cache_key__')` would work. – Roy Cohen Feb 24 '21 at 13:17