5

I am dealing with a huge dataframe. I would like to avoid pickling in-between user queries. Want to know if i can save the DataFrame in Flask Session and access it from session hence avoiding pickling.

I wrote the below code but i am faced with the error: [17578 rows x 319 columns] is not JSON serializable

#=====================================================================================
#=====================================================================================
@app.route('/start', methods=['GET', 'POST'])
def index():
  if 'catalogueDF' in session:
    if request.method == 'POST':
      query = request.get_json('query')   # Read user query
      df = session['catalogueDF']
      result = str(list(set(df['brandname']))[2])

    else:
      query = request.args.get('query')
      result = 'User query: '+str(query)

  else:
    df = pd.read_excel('errorfree.xlsx', sheetname='Sheet1').fillna('NA')
    df = pd.DataFrame([df[col].astype(str, na=False).str.lower() for col in df]).transpose()
    session['catalogueDF'] = df
    result = 'no query posted yet'

  response = app.response_class(
          response=json.dumps(result),
          status=200,
          mimetype='application/json'
          )
  return response

# Flask start of app
if __name__ == '__main__':
  app.secret_key = os.urandom(24)   # Sessions need encryption
  app.run(debug = True)
SunDante
  • 63
  • 1
  • 8
  • You can learn [How to Ask a good question](http://stackoverflow.com/help/how-to-ask) and create a [Minimal, Complete, and Verifiable](http://stackoverflow.com/help/mcve) example. That makes it easier for us to help you. – Stephen Rauch Apr 09 '17 at 04:19
  • Have tried to improve my query and title. Thanks @Stephen – SunDante Apr 09 '17 at 09:11
  • Thanks for the update, but... This is not [minimal or complete](http://stackoverflow.com/help/mcve), and thus not verifiable. Please read the information at the link carefully. There are many things here that I start to ask myself, because your post does not provide the information. As an example, is this a flask problem or a pandas problem? Are we dealing with Post or a Get error? What line is the error actually on,and what does the error message say? – Stephen Rauch Apr 09 '17 at 15:38
  • 2
    I love dataframes and I love flask but I never thought of using them together in this way, what is the motivation? Using Flask's default client-side session, you might be up against size limits. http://stackoverflow.com/questions/16367491/flask-client-side-sessions You might check out flask-kvsession to allow for server-side session management. http://pythonhosted.org/Flask-KVSession/ – brennan Apr 10 '17 at 15:00

2 Answers2

4

Just to clarify, it looks like you want to store a DataFrame into flask sessions. Sessions object needs to be serialized i.e. the value that is stored in session['my_object_name'] needs to be a serialized object.

I find it easiest to convert it into a dictionary before saving it in the session object:

dict_obj = df.to_dict('list')
session['data'] = dict_obj

To retrieve the session object in another function as a dataframe, convert the dictionary back to the original dataframe:

dict_obj = session['data'] if 'data' in session else ""  
df = pd.DataFrame(dict_obj)
Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
1

This method support only pandas version 0.24.2 or lower for new pandas version MessagePack is obsoleted

If I understand your question, It's seem you need to store DataFrame into Flask sessions. Unfortunately the Flask sessions don't understand pandas DataFrame.

However, If you really need to keep it. you can store as a binary by using the MessagePack.

data = df.to_msgpack()
session['data'] = data

Read the MessagePack

df1 = pd.read_mesgpack(session['data'])

Another Idea. You can pass DataFrame to StringIO and save in it into session again.

PS. Before you decide to use sessions, please check the size of session first.

6LYTH3
  • 1,426
  • 1
  • 10
  • 9
  • 1
    Hi, it looks like msgpack has been removed from pandas : https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-msgpack – willll Aug 05 '20 at 09:57
  • 1
    @willll Yes, Thank you for update. the key is we should know what is data type can store in sessions and then just cast it before save to sessions – 6LYTH3 Jun 24 '21 at 10:04