6

I have a redis db that has thousands of keys and I'm currently running the following line to get all the keys:

string[] keysArr = keys.Select(key => (string)key).ToArray();

But because I have a lot of keys this takes a long time. I want to limit the number of keys being read. So I'm trying to run an execute command where I get 100 keys at a time:

var keys = Redis.Connection.GetDatabase(dbNum).Execute("scan", 0, "count", 100);

This command successfully runs the command, however unable to access the the value as it is private, and unable to cast it even though RedisResult classs provides a explicit cast to it:

public static explicit operator string[] (RedisResult result);

Any ideas to get x amount of keys at a time from redis?

Thanks

Person
  • 85
  • 1
  • 5

2 Answers2

10

SE.Redis has a .Keys() method on IServer API which fully encapsulates the semantics of SCAN. If possible, just use this method, and consume the data 100 at a time. It is usually pretty easy to write a batching function, i.e.

ExecuteInBatches(server.Keys(), 100, batch => DoSomething(batch));

with:

public void ExecuteInBatches<T>(IEnumerable<T> source, int batchSize,
        Action<List<T>> action)
{
    List<T> batch = new List<T>();
    foreach(var item in source) {
        batch.Add(item);
        if(batch.Count == batchSize) {
             action(batch);
             batch = new List<T>(); // in case the callback stores it
        }
    }
    if (batch.Count != 0) {
        action(batch); // any leftovers
    }
}

The enumerator will worry about advancing the cursor.


You can use Execute, but: that is a lot of work! Also, SCAN makes no gaurantees about how many will be returned per page; it can be zero - it can be 3 times what you asked for. It is ... guidance only.

Incidentally, the reason that the cast fails is because SCAN doesn't return a string[] - it returns an array of two items, the first of which is the "next" cursor, the second is the keys. So maybe:

var arr = (RedisResult[])server.Execute("scan", 0);
var nextCursor = (int)arr[0];
var keys = (RedisKey[])arr[1];

But all this is doing is re-implementing IServer.Keys, the hard way (and significantly less efficiently - ServerResult is not the ideal way to store data, it is simply necessary in the case of Execute and ScriptEvaluate).

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Instead of `(int)arr[0];` should the cursor value in the `RedisResult` array be cast to `Int64`? I think that would be consistent with `IScanningResult.Cursor` being an `Int64`. (Bless my heart, I am trying to use `ExecuteCommand` to use `SCAN` directly). – Stephen Swensen Apr 08 '23 at 15:22
  • 1
    @StephenSwensen quite possibly; actually, thinking about it there is also a case for using `string` - IIRC there was a redis cluster proxy (which may now be abandoned) which makes that value non-integer (although this would have come about *after* this answer) – Marc Gravell Apr 08 '23 at 21:45
1

I would use the .Take() method, outlined by Microsoft here.

Returns a specified number of contiguous elements from the start of a sequence.

It would look something like this:

//limit to 100
var keysArr = keys.Select(key => (string)key).Take(100).ToArray();
AussieJoe
  • 1,285
  • 1
  • 14
  • 29
  • How would I get the next hundred? from the scan command you get a hash for the next query – Person Jul 25 '18 at 15:18
  • @Person you would need to implement a paging `.Skip()` routine, like this: https://stackoverflow.com/questions/15414347/how-to-loop-through-ienumerable-in-batches – AussieJoe Jul 25 '18 at 15:22
  • 5
    using skip and take here to get the pages is ... frankly ... going to be very bad in this scenario; that intent cannot be expressed to the server, so you're implementing a triangular query (meaning: the first items are cheap, the next page is more expensive, and so on ... until the last page is absurdly expensive). This is not a good way to solve this problem *in this specific case* – Marc Gravell Jul 25 '18 at 16:03
  • @MarcGravell I think it depends on the scope of the desired output. It's not necessarily bad, if you only need a few hundred. – AussieJoe Jul 25 '18 at 16:29