Google App Engine may resolve different web requests to different processes or even different physical machines. Which means that it's a bit harder to maintain global state among different requests, that is, to implement local caches of the data.
When data modifications happen, you have to be careful to invalidate the local caches - on all processes (cache coherence issue).
Furthermore, if your GAE application is defined as threadsafe
, a single process could handle multiple requests at the same time, in different threads.
I sketched a possible solution:
- keep the data in-process using a global dictionary
- keep track of the version of the in-process data using a global dictionary
- keep the gold version of the data in a tiny
memcache
record (only the version tag, not the actual data, of course)
- when in-process local data is stale (invalid), fetch it from the gold storage (via the
value_provider
function)
- when appropriate, invalidate in-process data among all machines (by resetting the gold version tag).
Here is the code:
import threading
from uuid import uuid4
from google.appengine.api import memcache
_data = dict()
_versions = dict()
lock = threading.Lock()
TIME = 60 * 10 # 10 minutes
def get(key, value_provider):
"""
Gets a value from the in-process storage (cache).
If the value is not available in the in-process storage
or it is invalid (stale), then it is fetched by calling the 'value provider'.
"""
# Fast check, read-only step (no critical section).
if _is_valid(key):
return _data[key]
# Data is stale (invalid). Perform read+write step (critical section).
with lock:
# Check again in case another thread just left the critical section
# and brought the in-process data to a valid state.
if _is_valid(key):
return _data[key]
version = memcache.get(key)
# If memcache entry is not initialized
if not version:
version = uuid4()
memcache.set(key, version, time=TIME)
_data[key] = value_provider()
_versions[key] = version
return _data[key]
def _is_valid(key):
"""Whether the in-process data has the latest version (according to memcache entry)."""
memcache_version = memcache.get(key)
proc_version = _versions.get(key, None)
return memcache_version and memcache_version == proc_version
def invalidate(key):
"""Invalidates the in-process cache for all processes."""
memcache.set(key, uuid4(), time=TIME)
References:
https://softwareengineering.stackexchange.com/a/222818
Understanding global object persistence in Python WSGI apps
Problem declaring global variable in python/GAE
Python Threads - Critical Section
https://en.wikipedia.org/wiki/Cache_coherence