42

After setting a DataFrame to redis, then getting it back, redis returns a string and I can't figure out a way to convert this str to a DataFrame.

How can I do these two appropriately?

jgritty
  • 11,660
  • 3
  • 38
  • 60
Alex Luya
  • 9,412
  • 15
  • 59
  • 91

6 Answers6

59

set:

redisConn.set("key", df.to_msgpack(compress='zlib'))

get:

pd.read_msgpack(redisConn.get("key"))
Kevin Ghaboosi
  • 606
  • 10
  • 20
Alex Luya
  • 9,412
  • 15
  • 59
  • 91
  • 19
    As of pandas 0.25.1, `to_msgpack` is deprecated in favor of `pyarrow`. Check [this SO post](https://stackoverflow.com/a/57986261/4126114) for a full example of `pandas + pyarrow + redis` – Shadi Sep 18 '19 at 06:28
  • `pyarrow` is deprecating serialization/deserialization in 2.0.0 https://arrow.apache.org/blog/2020/10/22/2.0.0-release/ – binarymason Apr 23 '21 at 21:59
9

I couldn't use msgpack because of Decimal objects in my dataframe. Instead I combined pickle and zlib together like this, assuming a dataframe df and a local instance of Redis:

import pickle
import redis
import zlib

EXPIRATION_SECONDS = 600

r = redis.StrictRedis(host='localhost', port=6379, db=0)

# Set
r.setex("key", EXPIRATION_SECONDS, zlib.compress( pickle.dumps(df)))

# Get
rehydrated_df = pickle.loads(zlib.decompress(r.get("key")))

There isn't anything dataframe specific about this.

Caveats

  • the other answer using msgpack is better -- use it if it works for you
  • pickling can be dangerous -- your Redis server needs to be secure or you're asking for trouble
Mark Chackerian
  • 21,866
  • 6
  • 108
  • 99
5

For caching a dataframe use this.

import pyarrow as pa

def cache_df(alias,df):

    pool = redis.ConnectionPool(host='host', port='port', db='db')
    cur = redis.Redis(connection_pool=pool)
    context = pa.default_serialization_context()
    df_compressed =  context.serialize(df).to_buffer().to_pybytes()

    res = cur.set(alias,df_compressed)
    if res == True:
        print('df cached')

For fetching the cached dataframe use this.

def get_cached_df(alias):

    pool = redis.ConnectionPool(host='host',port='port', db='db') 
    cur = redis.Redis(connection_pool=pool)
    context = pa.default_serialization_context()
    all_keys = [key.decode("utf-8") for key in cur.keys()]

    if alias in all_keys:   
        result = cur.get(alias)

        dataframe = pd.DataFrame.from_dict(context.deserialize(result))

        return dataframe

    return None
keikai
  • 14,085
  • 9
  • 49
  • 68
Lucky M.E.
  • 91
  • 1
  • 4
5

to_msgpack is not available at the last versions of Pandas.

import redis
import pandas as pd

# Create a redis client
redisClient = redis.StrictRedis(host='localhost', port=6379, db=0)
# Create un dataframe
dd = {'ID': ['H576','H577','H578','H600', 'H700'],
  'CD': ['AAAAAAA', 'BBBBB', 'CCCCCC','DDDDDD', 'EEEEEEE']}
df = pd.DataFrame(dd)
data = df.to_json()
redisClient.set('dd', data)
# Retrieve the data
blob = redisClient.get('dd')
df_from_redis = pd.read_json(blob)
df_from_redis.head()

output

ijasanchez
  • 51
  • 1
  • 2
2
import pandas as pd
df = pd.DataFrame([1,2])
redis.setex('df',100,df.to_json())
df = redis.get('df')
df = pd.read_json(df)
Quantum Dreamer
  • 432
  • 1
  • 6
  • 17
  • 4
    Remember to offer an explanation, and not just code. It's important to help readers understand _why_ your code works, not just _what_ to do. This is especially important when answering old questions with established answers—in this case, an accepted answer from nearly four years ago with quite a few votes. What value does your approach offer beyond that suggestion? Are you using new techniques that are faster, cleaner, or more reliable? – Jeremy Caney Jun 13 '20 at 00:28
0

It's 2021, which means df.to_msgpack() is deprecated AND pyarrow has deprecated their custom serialization functionality as of pyarrow 2.0. (see the "Arbitrary Object Serialization" section on pyarrow's serialization page

That leaves good & trusty msgpack to serialize objects such that they can be pushed/stored into redis.

import msgpack
import redis 

# ...Writing to redis (already have data & a redis connection client)
redis_client.set('data_key_name', msgpack.packb(data))

# ...Retrieving from redis
retrieved_data = msgpack.unpackb(redis_client.get('data_key_name'))