I am retrieving a big set of data from BigQuery and performing some sort of data transformation on it and storing my data in a Panda dataframe.
Since this data doesn't need to be retrieved again from the database every time, I want to cache it to avoid recalling the database and performing the same sort of transformation again.
Let's assume that the data is bigger than 512 MB which is the string value limitation in Redis.
I am considering using Redis Cluster to distribute this caching process, but I don't know how I should store the data.
There are two ways that I can think of for this purpose:
- Based on this thread and this one, we can compress the dataframe using
zlib
and store it in a key. In this case, I am not sure when the data is greater than 512MB, Redis is automatically splitting it in the cluster nodes? - Storing each row of the dataframe as a key in the Redis. In this case, I am not sure how to read the data back as a Panda dataframe again.