8

i am having a tough time understanding how to use binary datatypes with redis. I want to use the command

   set '{binary data}' 'Alex'

what if the binary data actually includes a quote symbol or /r/n? I know I can escape characters but is there an official list of characters I need to escape?

user1978109
  • 727
  • 1
  • 8
  • 19

4 Answers4

8

Arbitrary bytes can be input in redis-cli using hexadecimal notation, e.g.

set "\x00\xAB\x20" "some value"
ghik
  • 10,706
  • 1
  • 37
  • 50
3

There's no need to do anything special with the data itself. All Redis strings are binary safe.

Your problem relates to redis-cli (which is a very nice redis client for getting to know Redis, but almost never what you want in production, because of usage and performance issues).

Your problem also relates to common (bash/sh/other) terminal escaping. Here's a nice explanation.

I suggest you use python for this, or any other language you are comfortable with.

Example:

import redis
cli=redis.Redis('localhost', 6379)
with open('data.txt','rb') as f:
  for d in f:
    t = d.partition('\t')
    cli.set(t[0], t[2].rstrip())
#EOF
Community
  • 1
  • 1
Tw Bert
  • 3,659
  • 20
  • 28
  • What do you mean redis-cli is only good for getting to know redis? Are you talking about using it from the command line and python or both? I currently use it with python, and did notice that it took a while to load a large amount of data, so now I'm working on a mass insertion technique for at least a million lines (set commands). What is the alternative? – spacedustpi Feb 10 '19 at 02:18
  • 1
    @spacedustpi Emphasis on: **almost** never what you want in production. Load a large amount of data into redis takes some time. At our company, we found that the best performing mass insertion technique (and yes, we sometimes dump hundreds of millions of rows) is: serialize to msgpack and use `EVALSHA` to execute a preloaded Lua script. For serialization, we wrote a Cython reserializer from a propriatary export format directly to msgpack. Hope this helps, and as always: avoid premature optimization. The redis docs have some nice guidelines on simpler mass insertions techniques. – Tw Bert Feb 10 '19 at 13:14
  • Ah, Cython makes sense to take advantage of available speed gains. I don't know much about reserializing, but I don't understand how that would be helpful here unless such a tactic shrinks the size of the message or turns it into something Redis understands more readily. By msgpack, I'm assume the group of set, key, value commmands that you want to load efficiently. I've just started a tutorial on how to use cython in Jupyter Notebook (probably not the most efficient IDE for this purpose, but one I know well) and I am using the method of using a text file of loading a group (million) msgs. – spacedustpi Feb 10 '19 at 16:01
  • 1
    @spacedustpi "turns it into something Redis understands more readily" -> exactly that. Inside redis-Lua, the msgpack deserializer works very efficiently. If you have a lot of floats/ints/bools, they take up a lot less bytes over the wire compared to json or other non-binary serialization formats. Then again, there are cases where json is more efficient. And inside Redis-Lua, json is just as easy. Related: you might find [this post](https://groups.google.com/forum/#!searchin/redis-db/lua$20performance|sort:date/redis-db/D4V6kDJNDsI/qX7z65b8dUwJ) an interesting read. – Tw Bert Feb 10 '19 at 21:48
  • 1
    @spacedustpi And please don't get sidetracked by my Cython remark. That was a very specific scenario, where we could circumvent 'turning the variables into CPython-native datatypes' and just pass forward the plain bytes as-is to redis. The performance increase was substantial (a factor 3 if I remember correctly) but again, a very specific use case. – Tw Bert Feb 10 '19 at 21:56
1

You can send the command as an array of bulk strings to Redis, no need to escape characters or Base64 encode. Since bulk strings begin with the data length, Redis doesn't try to parse the data bytes and instead just jumps to the end to verify the terminating CR/LF pair:

*3<crlf>
$3<crlf>SET<crlf>
${binary_key_length}<crlf>{binary_key_data}<crlf>
${binary_data_length}<crlf>{binary_data}<crlf>
bmalec
  • 86
  • 4
0

I found it is best to use the Redis protocol to do this as the boundaries can be defined before the datatype.

user1978109
  • 727
  • 1
  • 8
  • 19