1

tldr version:

I have a long encoded json payload that I store on redis as a key. I would like to know if hashing the key before storing will improve lookup performance and which hashing algorithm is recommended (I'm considering md5/sha1).

p/s i'm using python for code

other notes:

  • ttl for key is short (30 secs) hence I don't care about hash collision
  • I only need to check if key exists in redis

long story version:

I have a stream of transactions in json that are encoded in protobuf flowing to my application via a message queue at a high rate. I run worker nodes that read the data from the queue and process the data. However I realized that there were instances that duplicates were being sent.

my solution was to store the data in a "global" cache (redis) where my workers would check before attempting to process. as the flow rate is high, decoding the data and reading it is expensive hence i'm storing the strings whole.

transactions expire after 30s so i use a ttl of 30s.

therefore i'm wondering if hashing the strings before storing them would be a good idea as i only need to check for existance

thanks for reading

Super Kai - Kazuya Ito
  • 22,221
  • 10
  • 124
  • 129
silvercondor
  • 79
  • 1
  • 6

1 Answers1

0

You really don't need a cryptographic hash. You want the fastest cryptographic algorithm that is good at collision avoidance.

Here is a good discussion of various options.

Fastest hash for non-cryptographic uses?

The Redis documentation discusses optimal key size here: https://redis.io/topics/data-types-intro under the section "Redis keys"

Frank Yellin
  • 9,127
  • 1
  • 12
  • 22