3

I used to cache a database query in a global variable to speed up my application. Since this is strongly unadvised (and it did generate problems), I want to use any kind of Django cache instead. I tried LocMemCache and DatabaseCache, but both take... about 15 seconds to set my variable (twice longer than it take to generate the data, which is 7MB in size).

Is that expected ? Am I doing something wrong ?

(Memcached is limited to 1MB, and I cannot split my data, which consists in arbitrarily big binary masks).

Edit: FileBasedCache takes 30s to set as well.

Settings.py:

CACHES = {
    'default': {...},
    'stats': {
        'BACKEND': 'django.core.cache.backends.db.DatabaseCache', 
        # or 'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
        'LOCATION': 'stats',
    },
}

Service.py:

from django.core.cache import caches

def stats_service():
    stats_cache = caches['stats']
    if stats_cache.get('key') is None:
        stats_cache.set('key', data)  # 15s with DatabaseCache, 30s with LocMemCache
    return stats_cache.get('key')

Global variable (super fast) version:

_cache = {}

def stats_service():
    if _cache.get('key') is None:
        _cache['key'] = data
    return _cache['key']
JulienD
  • 7,102
  • 9
  • 50
  • 84
  • 1
    The cache pickles the value, I'm not surprised that it takes such a long time to pickle a 7MB value. Depending on what you're caching and what you're using it for, there might be better ways. – knbk Feb 12 '16 at 15:24
  • That explains it indeed, I totally missed that point. I absolutely don't want to pickle it (obviously 7MB RAM is not an issue). I am caching bit masks (binary numpy arrays) that I reuse in every computation. Would you have any suggestion ? – JulienD Feb 12 '16 at 15:29
  • I found this: https://djangosnippets.org/snippets/2396/. My only fear with a global dict is that I use multiprocessing in computations using the cached arrays. – JulienD Feb 12 '16 at 15:40

2 Answers2

2

One option may be to use diskcache.DjangoCache. DiskCache extends the Django cache API to support writing and reading binary streams as-is (avoid pickling). It works particularly well for large values (like those greater than 1MB). DiskCache is an Apache2 licensed disk and file backed cache library, written in pure-Python, and compatible with Django.

In your case, you could use ndarray tostring and numpy fromstring methods to quickly convert to/from a Python string. Then wrap the string with io.StringIO to store/retrieve in the cache. For example:

from django.core.cache import cache

value = cache.get('cache-key', read=True)

if value:
    data = numpy.fromstring(value.read())
    value.close()
else:
    data = ... # Generate 7MB array.
    cachge.set('cache-key', io.StringIO(data.tostring()), read=True)

DiskCache extends the Django cache API by permitting file-like values which are stored as binary blobs on disk. The Django cache benchmarks page has a discussion and comparison of alternative cache backends.

GrantJ
  • 8,162
  • 3
  • 52
  • 46
  • Thanks, I am definitely going to try this. I'd really like to use fast memory access instead of disk and avoid a conversion, though ( – JulienD Mar 21 '16 at 08:08
  • @muraveill To avoid using files on disk, increase the `large_value_threshold` setting ([API docs](http://www.grantjenks.com/docs/diskcache/api.html#diskcache.DEFAULT_SETTINGS)). Cache writes still persist on disk but reads will happen from a synchronized memory-mapped file. In that case, don't wrap the values with `io.StringIO` just pass the raw byte strings. They won't be pickled. You may need to also increase other cache settings to use more memory. – GrantJ Mar 21 '16 at 17:27
0

This snippet actually works fine: https://djangosnippets.org/snippets/2396/

As I understood, the only problem with using global variables for caching is thread safety, and this no-pickle version is thread-safe.

Community
  • 1
  • 1
JulienD
  • 7,102
  • 9
  • 50
  • 84