I have a database (mysql) where I want to store pickled data.
The data can be for instance a dictionary, which may contain unicode, e.g.
data = {1 : u'é'}
and the database (mysql) is in utf-8.
When I pickle,
import pickle
pickled_data = pickle.dumps(data)
print type(pickled_data) # returns <type 'str'>
the resulting pickled_data is a string.
When I try to store this in a database (e.g. in a Textfield) this can causes problems. In particular, I'm getting at some point a
UnicodeDecodeError "'utf8' codec can't decode byte 0xe9 in position X"
when trying to save the pickled_data in the database. This makes sense because pickled_data can have non-utf-8 characters. My question is how do I store pickled_data on a utf-8 database?
I see two possible candidates:
Encode the result of the pickle.dump to utf-8 and store it. When I want to pickle.load, I have to decode it.
Store the pickled string in binary format (how?), which forces all characters to be within ascii.
My issue is that I'm not seeing what are the consequences of choosing one of this options in the long run. Since the change already requires some effort, I'm driven to ask for an opinion on this issue, asking for eventual better candidates.
(P.S. This is for instance useful in Django)