121

There is a post about a Redis command to get all available keys, but I would like to do it with Python.

Any way to do this?

fedorqui
  • 275,237
  • 103
  • 548
  • 598
tscizzle
  • 11,191
  • 15
  • 54
  • 88

6 Answers6

176

Use scan_iter()

scan_iter() is superior to keys() for large numbers of keys because it gives you an iterator you can use rather than trying to load all the keys into memory.

I had a 1B records in my redis and I could never get enough memory to return all the keys at once.

SCANNING KEYS ONE-BY-ONE

Here is a python snippet using scan_iter() to get all keys from the store matching a pattern and delete them one-by-one:

import redis
r = redis.StrictRedis(host='localhost', port=6379, db=0)
for key in r.scan_iter("user:*"):
    # delete the key
    r.delete(key)

SCANNING IN BATCHES

If you have a very large list of keys to scan - for example, larger than >100k keys - it will be more efficient to scan them in batches, like this:

import redis
from itertools import izip_longest

r = redis.StrictRedis(host='localhost', port=6379, db=0)

# iterate a list in batches of size n
def batcher(iterable, n):
    args = [iter(iterable)] * n
    return izip_longest(*args)

# in batches of 500 delete keys matching user:*
for keybatch in batcher(r.scan_iter('user:*'),500):
    r.delete(*keybatch)

I benchmarked this script and found that using a batch size of 500 was 5 times faster than scanning keys one-by-one. I tested different batch sizes (3,50,500,1000,5000) and found that a batch size of 500 seems to be optimal.

Note that whether you use the scan_iter() or keys() method, the operation is not atomic and could fail part way through.

DEFINITELY AVOID USING XARGS ON THE COMMAND-LINE

I do not recommend this example I found repeated elsewhere. It will fail for unicode keys and is incredibly slow for even moderate numbers of keys:

redis-cli --raw keys "user:*"| xargs redis-cli del

In this example xargs creates a new redis-cli process for every key! that's bad.

I benchmarked this approach to be 4 times slower than the first python example where it deleted every key one-by-one and 20 times slower than deleting in batches of 500.

maxymoo
  • 35,286
  • 11
  • 92
  • 119
Patrick Collins
  • 4,046
  • 3
  • 26
  • 29
  • I keep getting "redis.exceptions.ResponseError: unknown command 'SCAN'" when iterating over r.scan_iter(). Any idea why? I haven't found an anwswer yet. – BringBackCommodore64 Mar 13 '17 at 14:34
  • 1
    @BringBackCommodore64 Your version of redis is too old, install a new one. – piokuc Nov 11 '17 at 22:45
  • @piokuc Well, I haven't upgraded my redis but your guess seems obviously right! – BringBackCommodore64 Nov 14 '17 at 12:26
  • @BringBackCommodore64 It's not a guess. I had the same problem, upgrade solved that. Can't remember the version that I had that didn't support the SCAN, but it was few years old. Any recent version of Redis should be OK. – piokuc Nov 14 '17 at 21:47
  • what is `user:*` for? – Lei Yang Oct 31 '19 at 03:18
  • 1
    @LeiYang redis search allows globs/wildcards. So "mykey*", "user_*", "user:*". https://redis.io/commands/keys – Patrick Collins Jan 24 '20 at 03:08
  • @PatrickCollins any idea of how to pass a codec while reading ? – roottraveller Feb 17 '20 at 10:18
  • 8
    izip_longest was renamed to zip_longest in Python 3 https://stackoverflow.com/questions/38634810/failing-to-import-itertools-in-python-3-5-2 – NealWalters Oct 27 '20 at 17:37
  • 2
    The "scanning in batches" section here is misleading here. You've probably got a better performance but it's not related to fetching of the keys which is what that question is about. The better performance that you've got is probably from deleting the keys in batches instead of 1 by 1. – scdekov Dec 22 '20 at 08:33
  • AVOID RUNNING THIS AS IS!!! It delete all the redis keys. I have improved on this answer and added export to CSV the keys and the values. – Gerhard Powell Jun 02 '22 at 16:51
  • Note that for true "scanning in batches" experience we have `count` argument of the `scan_iter` method: https://redis-py-doc.readthedocs.io/en/master/index.html?highlight=scan#redis.Redis.scan_iter – eugenesqr Mar 31 '23 at 11:26
79

Yes, use keys() from the StrictRedis module:

>>> import redis
>>> r = redis.StrictRedis(host=YOUR_HOST, port=YOUR_PORT, db=YOUR_DB)
>>> r.keys()

Giving a null pattern will fetch all of them. As per the page linked:

keys(pattern='*')

Returns a list of keys matching pattern

fedorqui
  • 275,237
  • 103
  • 548
  • 598
  • 23
    Be aware that the use of this command is discouraged on production servers. If you have a high number of keys, your Redis instance will not respond to any other request while processing this one, that may take a rather long time to complete. – Pascal Le Merrer Mar 08 '14 at 08:40
  • 3
    Consider adding a reference to `SCAN` command as it is now a preferred way to get all keys with O(1) time complexity of each request. (and O(N) for all of the requests) – Kirill Zaitsev Mar 08 '14 at 14:11
  • 2
    `r.keys()` is quite slow when you are trying to match a pattern and not just returning all keys. Consider using `scan` as suggested in the answer below – cnikolaou Apr 27 '17 at 13:01
  • 2
    @KonstantineNikolaou I notified the OP and he gladly unaccepted my answer to accept the other one. Thanks for reporting, I had used this so long ago but I now lack the focus on the topic to check what is best. – fedorqui Apr 27 '17 at 14:19
  • @fedorqui glad to hear that – cnikolaou Apr 29 '17 at 14:22
  • Don't use keys(). Use scan() instead. With the added benefit of pattern matching. – Saman Hamidi Apr 28 '21 at 09:34
  • 1
    @SoroushParsa if you uphold the `scan()` option, then upvote the other answer. In fact, mine was the accepted one and I asked the OP to accept the other one. To me, downvoting this one per se doesn't really match the "this answer is not useful" thingie. – fedorqui Apr 28 '21 at 10:43
  • @fedorqui'SOstopharming' valid point. I did in fact upvote the accepted answer. – Saman Hamidi May 01 '21 at 06:25
18
import redis
r = redis.Redis("localhost", 6379)
for key in r.scan_iter():
       print key

using Pyredis library

scan command

Available since 2.8.0.

Time complexity: O(1) for every call. O(N) for a complete iteration, including enough command calls for the cursor to return back to 0. N is the number of elements inside the collection..

Seg-mel
  • 63
  • 1
  • 3
Black_Rider
  • 1,465
  • 2
  • 16
  • 18
3

I'd like to add some example code to go with Patrick's answer and others.
This shows results both using keys and the scan_iter technique. And please note that Python3 uses zip_longest instead of izip_longest. The code below loops through all the keys and displays them. I set the batchsize as a variable to 12, to make the output smaller.

I wrote this to better understand how the batching of keys worked.

import redis
from itertools import zip_longest

\# connection/building of my redisObj omitted here

\# iterate a list in batches of size n
def batcher(iterable, n):
    args = [iter(iterable)] * n
    return zip_longest(*args)
    
result1 = redisObj.get("TestEN")
print(result1)
result2 = redisObj.get("TestES")
print(result2)

print("\n\nLoop through all keys:")
keys = redisObj.keys('*')
counter = 0
print("len(keys)=", len(keys))
for key in keys:
    counter +=1
    print (counter, "key=" +key, " value=" + redisObj.get(key))

print("\n\nLoop through all keys in batches (using itertools)")
\# in batches of 500 delete keys matching user:*
counter = 0
batch_counter = 0
print("Try scan_iter:")
for keybatch in batcher(redisObj.scan_iter('*'), 12):
    batch_counter +=1
    print(batch_counter, "keybatch=", keybatch)
    for key in keybatch:
        if key != None:
            counter += 1
            print("  ", counter, "key=" + key, " value=" + redisObj.get(key))

Example output:

Loop through all keys:
len(keys)= 2
1 key=TestES  value=Ola Mundo
2 key=TestEN  value=Hello World


Loop through all keys in batches (using itertools)
Try scan_iter:
1 keybatch= ('TestES', 'TestEN', None, None, None, None, None, None, None, None, None, None)
   1 key=TestES  value=Ola Mundo
   2 key=TestEN  value=Hello World

Note redis comamnds are single threaded, so doing a keys() can block other redis activity. See excellent post here that explains that in more detail: SCAN vs KEYS performance in Redis

NealWalters
  • 17,197
  • 42
  • 141
  • 251
2

An addition to the accepted answer above.

scan_iter can be used with a count parameter in order to tell redis to search through a number of keys during a single iteration. This can speed up keys fetching significantly, especially when used with matching pattern and on big key spaces.

Be careful tough when using very high values for the count since that may ruin the performance for other concurrent queries.

https://docs.keydb.dev/blog/2020/08/10/blog-post/ Here's an article with more details and some benchmarks.

scdekov
  • 153
  • 1
  • 12
1

I have improved on Patrick's and Neal's code and added export to csv:

import csv
import redis
from itertools import zip_longest

redisObj = redis.StrictRedis(host='localhost', port=6379, db=0, decode_responses=True)
searchStr = ""

# iterate a list in batches of size n
def batcher(iterable, n):
    args = [iter(iterable)] * n
    return zip_longest(*args)

with open('redis.csv', 'w', newline='') as csvfile:
    fieldnames = ['key', 'value']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()

    print("\n\nLoop through all keys in batches (using itertools)")
    counter = 0
    batch_counter = 0
    print("Try scan_iter:")
    for keybatch in batcher(redisObj.scan_iter('*'), 500):
        batch_counter +=1
        #print(batch_counter, "keybatch=", keybatch)
        for key in keybatch:
            if key != None:
                counter += 1
                val = ""
                if (searchStr in key):
                    valType = redisObj.type(key)
                    print(valType)
                    match valType:
                        case "string":
                            val = redisObj.get(key)
                        case "list":
                            valList = redisObj.lrange(key, 0, -1)
                            val = '\n'.join(valList)
                        case "set":
                            valList = redisObj.smembers(key)
                            val = '\n'.join(valList)
                        case "zset":
                            valDict = redisObj.zrange(key, 0, -1, False, True)
                            val = '\n'.join(['='.join(i) for i in valDict.items()])
                        case "hash":
                            valDict = redisObj.hgetall(key)
                            val = '\n'.join(['='.join(i) for i in valDict.items()])
                        case "stream":
                            val = ""
                        case _:
                            val = ""
                print("  ", counter, "key=" + key, " value=" + val)
                writer.writerow({'key': key, 'value': val})
Gerhard Powell
  • 5,965
  • 5
  • 48
  • 59