0

I have created a couple of Superset dashboards in a production environment, and I have read that it's recommended to use the Redis cache in production environments. A similar StackOverflow question here.

Firstly, I would like to understand what am I going to achieve by adding the following code in the superset_config.py

FILTER_STATE_CACHE_CONFIG = {
    'CACHE_TYPE': 'RedisCache',
    'CACHE_DEFAULT_TIMEOUT': 86400,
    'CACHE_KEY_PREFIX': 'superset_filter_',
    'CACHE_REDIS_URL': 'redis://localhost:6379/2'
}

Secondly, I would like to know how can I auto-refresh a superset dashboard permanently and not only for the current session option which is available. Is there any way? (similar question in this thread).

Thanks in advance for your help.

-- UPDATE 26.07.2022 (after more research)

Reference links: official doc, issue 390

I have added the following dictionaries in my superset_config.py file:

CACHE_CONFIG: CacheConfig = {
    'CACHE_TYPE': 'RedisCache',
    'CACHE_DEFAULT_TIMEOUT': int(timedelta(days=1).total_seconds()),
    'CACHE_KEY_PREFIX': 'superset_cache_',
    'CACHE_REDIS_URL': 'redis://redis:6379/2'
}

# Cache for datasource metadata and query results
DATA_CACHE_CONFIG: CacheConfig = {
    'CACHE_TYPE': 'RedisCache',
    'CACHE_DEFAULT_TIMEOUT': int(timedelta(days=1).total_seconds()),
    'CACHE_KEY_PREFIX': 'superset_data_',
    'CACHE_REDIS_URL': 'redis://redis:6379/3'
}

FILTER_STATE_CACHE_CONFIG: CacheConfig = {
    'CACHE_TYPE': 'RedisCache',
    'CACHE_DEFAULT_TIMEOUT': int(timedelta(days=1).total_seconds()),
    'CACHE_KEY_PREFIX': 'superset_filter_',
    'CACHE_REDIS_URL': 'redis://redis:6379/4'
}

EXPLORE_FORM_DATA_CACHE_CONFIG: CacheConfig = {
    'CACHE_TYPE': 'RedisCache',
    'CACHE_DEFAULT_TIMEOUT': int(timedelta(days=1).total_seconds()),
    'CACHE_KEY_PREFIX': 'superset_explore_',
    'CACHE_REDIS_URL': 'redis://redis:6379/5'
}

The superset app started successfully and when I do a dashboard refresh I can see the query running in redis-cli. My concern is that every time I apply filters on the dashboards the data is re-cached. Shouldn't the caching be applied once for every filter in the datasource, so when I apply filters superset won't have to hit the DB to fetch new records.

NikSp
  • 1,262
  • 2
  • 19
  • 42

1 Answers1

0

I'm new to Superset and caching, but here's my understanding.

If a reader opens a report that has recently been cached, instead of the report being run again, a cached version is fetched from Redis. Redis can store and transmit data faster than a relational database - that's one advantage. This reduces traffic on the database as well.

The other advantage is that the cached report doesn't need to be re-computed, saving the need to run a potentially long-running and expensive query on your source database.

You can watch this happen by entering your Redis instance (for Docker, that means entering the container with docker exec -it superset_worker /bin/bash) and then running redis-cli. You can then run MONITOR to see all the transactions that are happening. And in the main Superset app, you can see when a report is fetched from the cache. Open those and then reload reports in your browser to generate the activity.

Your block of code would set up caching only of filters. You need similar blocks of code to set up filters for data, etc. Check the config.py file to see all the cache values that can be overwritten, then supply blocks of code similar to the one you have in your post for the other caches.

Here's a related post from Chartio, a Business Intelligence company that closed after being acquired, about the role of caching in serving their reports more efficiently.

Sam Firke
  • 21,571
  • 9
  • 87
  • 105
  • Hey Sam, thanks for your answer. I still do not fully understand the cache. Should I apply caching on specific dashboards? on specific filters? and the argument ```FILTER_STATE_CACHE_CONFIG``` is it a sufficient configuration? As you wrote, caching will save the state of a dashboard and won't have to run an expensive query every time someone opens a dashboard. This way, will the data be also refreshed? – NikSp Jul 26 '22 at 07:48
  • Sam please check my updated answer – NikSp Jul 26 '22 at 10:34