65

I'm writing a django management command to handle some of our redis caching. Basically, I need to choose all keys, that confirm to a certain pattern (for example: "prefix:*") and delete them.

I know I can use the cli to do that:

redis-cli KEYS "prefix:*" | xargs redis-cli DEL

But I need to do this from within the app. So I need to use the python binding (I'm using py-redis). I have tried feeding a list into delete, but it fails:

from common.redis_client import get_redis_client
cache = get_redis_client()
x = cache.keys('prefix:*') 

x == ['prefix:key1','prefix:key2'] # True

# And now

cache.delete(x) 

# returns 0 . nothing is deleted

I know I can iterate over x:

for key in x:
   cache.delete(key)

But that would be losing redis awesome speed and misusing its capabilities. Is there a pythonic solution with py-redis, without iteration and/or the cli?

Thanks!

Danil
  • 4,781
  • 1
  • 35
  • 50
alonisser
  • 11,542
  • 21
  • 85
  • 139

10 Answers10

61

Use SCAN iterators: https://pypi.python.org/pypi/redis

for key in r.scan_iter("prefix:*"):
    r.delete(key)
Alex Toderita
  • 611
  • 1
  • 5
  • 2
  • 3
    django-redis implements delete_pattern which does something very similar to this, see https://github.com/niwinz/django-redis/blob/master/django_redis/client/default.py#L264 . – Robert Lujo Nov 23 '17 at 10:35
  • I cannot find `scan_iter` in the Redis Python documentation. Has it been removed? – Dan Nissenbaum Mar 08 '23 at 00:49
40

Here is a full working example using py-redis:

from redis import StrictRedis
cache = StrictRedis()

def clear_ns(ns):
    """
    Clears a namespace
    :param ns: str, namespace i.e your:prefix
    :return: int, cleared keys
    """
    count = 0
    ns_keys = ns + '*'
    for key in cache.scan_iter(ns_keys):
        cache.delete(key)
        count += 1
    return count

You can also do scan_iter to get all the keys into memory, and then pass all the keys to delete for a bulk delete but may take a good chunk of memory for larger namespaces. So probably best to run a delete for each key.

Cheers!

UPDATE:

Since writing the answer, I started using pipelining feature of redis to send all commands in one request and avoid network latency:

from redis import StrictRedis
cache = StrictRedis()

def clear_cache_ns(ns):
    """
    Clears a namespace in redis cache.
    This may be very time consuming.
    :param ns: str, namespace i.e your:prefix*
    :return: int, num cleared keys
    """
    count = 0
    pipe = cache.pipeline()
    for key in cache.scan_iter(ns):
        pipe.delete(key)
        count += 1
    pipe.execute()
    return count

UPDATE2 (Best Performing):

If you use scan instead of scan_iter, you can control the chunk size and iterate through the cursor using your own logic. This also seems to be a lot faster, especially when dealing with many keys. If you add pipelining to this you will get a bit of a performance boost, 10-25% depending on chunk size, at the cost of memory usage since you will not send the execute command to Redis until everything is generated. So I stuck with scan:

from redis import StrictRedis
cache = StrictRedis()
CHUNK_SIZE = 5000

def clear_ns(ns):
    """
    Clears a namespace
    :param ns: str, namespace i.e your:prefix
    :return: int, cleared keys
    """
    cursor = '0'
    ns_keys = ns + '*'
    while cursor != 0:
        cursor, keys = cache.scan(cursor=cursor, match=ns_keys, count=CHUNK_SIZE)
        if keys:
            cache.delete(*keys)

    return True

Here are some benchmarks:

5k chunks using a busy Redis cluster:

Done removing using scan in 4.49929285049
Done removing using scan_iter in 98.4856731892
Done removing using scan_iter & pipe in 66.8833789825
Done removing using scan & pipe in 3.20298910141

5k chunks and a small idle dev redis (localhost):

Done removing using scan in 1.26654982567
Done removing using scan_iter in 13.5976779461
Done removing using scan_iter & pipe in 4.66061878204
Done removing using scan & pipe in 1.13942599297
radtek
  • 34,210
  • 11
  • 144
  • 111
  • 4
    I provided a full working example. Also would like others to comment on scan_iter vs bulk delete – radtek Jul 18 '17 at 18:27
  • 1
    Great answer, this should be the correct answer. Today I actually needed this answer myself and am preferring your answer above mine. Although some minor errors in your example, like no ns_keys variable in your first update, and :: within your second update. – Blackeagle52 Mar 28 '19 at 10:52
  • Thanks, but I actually don't use scanning in prod because its so slow, instead I end up caching every key in the namespace and doing a bulk delete that way. Seems like overkill I know but the performance is best because you don't have to scan cache at all. – radtek Mar 28 '19 at 16:14
  • FYI, there's a syntax error on line 13 of your third example; you have two colons (:) at the end of your while condition. – Joshua Davies Dec 26 '19 at 14:53
  • UPDATE2 is priceless in production (default `chunk_size` would be order of magnitude slower!): chunk_size:execute_time: 5:1m 44.8s 50:17.0s 500:7.3s 5000:Timeout reading from socket – mirekphd Jul 05 '20 at 11:20
  • Be very careful when using high values of `chunk_size` in PRD, increase size slowly, starting from 1, or else you can easily cause cache read timeouts! Chunk size has direct impact on `redis-server` CPU utilization (e.g. reaching 90% with `chunk_size` of 170 for a 30+ gig database with millions of keys, where `scan`+`delete` takes about 4 minutes for this maximum safe chunk). – mirekphd Jul 09 '20 at 11:31
  • It all depends on your instance size, fine tuning will be required, 5k chunk size was perfect for me. – radtek Jul 10 '20 at 14:44
  • 2
    The typing between `cursor = '0'` and `while cursor != 0` is awkward. You could use a `cursor = None` and `cache.scan(cursor=cursor or 0, ...)` to make it slightly better – Jivan Mar 07 '21 at 12:19
  • 2
    Note that even when this answer was written in 2017, redis-py has allowed you to provide a chunksize to `scan_iter`. You don't have to manage the cursor yourself. (https://github.com/andymccurdy/redis-py/blob/25c46abdebbf60c599e6b9fcd7a4532bd8272a55/redis/client.py#L1479-L1492) – Kyle Barron Sep 20 '21 at 22:04
27

I think the

 for key in x: cache.delete(key)

is pretty good and concise. delete really wants one key at a time, so you have to loop.

Otherwise, this previous question and answer points you to a lua-based solution.

Community
  • 1
  • 1
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
11

From the Documentation

delete(*names)
    Delete one or more keys specified by names

This just wants an argument per key to delete and then it will tell you how many of them were found and deleted.

In the case of your code above I believe you can just do:

    redis.delete(*x)

But I will admit I am new to python and I just do:

    deleted_count = redis.delete('key1', 'key2')
James
  • 1,651
  • 2
  • 18
  • 24
8

Btw, for the django-redis you can use the following (from https://niwinz.github.io/django-redis/latest/):

from django.core.cache import cache
cache.delete_pattern("foo_*")
Gleb
  • 731
  • 1
  • 8
  • 14
7

cache.delete(*keys) solution of Dirk works fine, but make sure keys isn't empty to avoid a redis.exceptions.ResponseError: wrong number of arguments for 'del' command.

If you are sure that you will always get a result: cache.delete(*cache.keys('prefix:*') )

Nam G VU
  • 33,193
  • 69
  • 233
  • 372
Blackeagle52
  • 1,956
  • 17
  • 16
  • 2
    Don't use `cache.keys()` in prod, its intended for debugging: https://redis.io/commands/keys – radtek Apr 22 '17 at 23:17
5

You can use a specific pattern to match all keys and delete them:

import redis
client = redis.Redis(host='192.168.1.106', port=6379,
                password='pass', decode_responses=True)
for key in client.keys('prefix:*'):
    client.delete(key)
Nemo
  • 2,441
  • 2
  • 29
  • 63
Lynn Han
  • 443
  • 6
  • 8
3

According to my test, it will costs too much time if I use scan_iter solution (as Alex Toderita wrote).

Therefore, I prefer to use:

from redis.connection import ResponseError

try:
    redis_obj.eval('''return redis.call('del', unpack(redis.call('keys', ARGV[1])))''', 0, 'prefix:*')
except ResponseError:
    pass

The prefix:* is the pattern.


refers to: https://stackoverflow.com/a/16974060

Nemo
  • 2,441
  • 2
  • 29
  • 63
carton.swing
  • 1,517
  • 17
  • 12
0

Use delete_pattern: https://niwinz.github.io/django-redis/latest/

from django.core.cache import cache
cache.delete_pattern("prefix:*")
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
Jijo
  • 159
  • 5
0

The answer suggested by @radtek is not working for me, since the keys are getting deleted while iterating, which leads to unexpected behavior. Here's an example:

from redis import StrictRedis
cache = StrictRedis()

for i in range(0, 10000):
    cache.set(f'test_{i}', 1)


cursor = '0'
SCAN_BATCH_SIZE = 5000
while cursor != 0:
    cursor, keys = self._redis.scan(cursor=cursor, match='test_*', count=SCAN_BATCH_SIZE)
    if keys:
        cache.delete(*keys)


## Iteration 1
# cursor=5000, keys=['test_0', .... , 'test_4999']
# keys will get deleted

## Iteration 2
# cursor=0, keys=[]
# No remaining keys are found reason being, there are just the 
# 5000 entries left post deletion and the cursor position is already 
# at 5000. Hence, no keys are returned.

You can use redis pipeline in order to solve this issue as mentioned below:

from redis import StrictRedis
cache = StrictRedis()

for i in range(0, 10000):
    cache.set(f'test_{i}', 1)


pipe = cache.pipeline()
cursor = None
SCAN_BATCH_SIZE = 5000
while cursor != 0:
    cursor, keys = self._redis.scan(cursor=cursor or 0, match='test_*', count=SCAN_BATCH_SIZE)
    if keys:
        pipe.delete(*keys)
pipe.execute()
Pankaj Saini
  • 752
  • 6
  • 7