145

I'm wondering if there is a way to load an object that was pickled in Python 2.4, with Python 3.4.

I've been running 2to3 on a large amount of company legacy code to get it up to date.

Having done this, when running the file I get the following error:

  File "H:\fixers - 3.4\addressfixer - 3.4\trunk\lib\address\address_generic.py"
, line 382, in read_ref_files
    d = pickle.load(open(mshelffile, 'rb'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal
not in range(128)

Looking at the pickled object in contention, it's a dict in a dict, containing keys and values of type str.

So my question is: Is there a way to load an object, originally pickled in python 2.4, with python 3.4?

NDevox
  • 4,056
  • 4
  • 21
  • 36
  • 1
    Does Python 2.4 have the `json` module? Perhaps you could write a 2.4 script that unpickles the object and saves it as a json object, and then write a 3.4 script that reads the json object and saves it as a 3.4-compatible pickle object. This would be a one-time operation that you run on all your pickle files. – Kevin Jan 29 '15 at 15:40
  • I was thinking along similar lines, considering that these are dicts I reckon I could just change sys.stdout to a file and print them out, but I want to see if I can load them first – NDevox Jan 29 '15 at 15:43
  • Related question having to do with datetimes specifically: https://stackoverflow.com/questions/24805105/unpickling-python2-datetime-under-python3 – John Y Jun 24 '19 at 17:12

2 Answers2

208

You'll have to tell pickle.load() how to convert Python bytestring data to Python 3 strings, or you can tell pickle to leave them as bytes.

The default is to try and decode all string data as ASCII, and that decoding fails. See the pickle.load() documentation:

Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.

Setting the encoding to latin1 allows you to import the data directly:

with open(mshelffile, 'rb') as f:
    d = pickle.load(f, encoding='latin1') 

but you'll need to verify that none of your strings are decoded using the wrong codec; Latin-1 works for any input as it maps the byte values 0-255 to the first 256 Unicode codepoints directly.

The alternative would be to load the data with encoding='bytes', and decode all bytes keys and values afterwards.

Note that up to Python versions before 3.6.8, 3.7.2 and 3.8.0, unpickling of Python 2 datetime object data is broken unless you use encoding='bytes'.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    How could this be made backward compatible with Python 2? Apparently, encoding argument isn't present for Python 2. – EpicAdv Jan 31 '17 at 00:26
  • 3
    @EpicAdv: you don't need to make this code compatible with Python 2; this question is about how to load Python 2 pickles into Python 3. Drop the `encoding` keyword altogether for Python 2. – Martijn Pieters Jan 31 '17 at 08:42
  • 11
    @EpicAdv: You can create a pickle_options dictionary that is either empty for python 2 or has `'encoding': 'latin1'` and send \*\*pickle_options to pickle. This way it should run in both versions. – pipefish Feb 13 '17 at 14:16
  • 1
    @pipefish - Clever, but *somewhere* you have to detect which version you're using, so you could also more straightforwardly just do the call differently (one with and one without the extra argument) depending on the version. But at least you got the gist of EpicAdv's comment, which Martijn's comment doesn't address at all. – John Y Apr 30 '19 at 19:49
  • 2
    I realize the `datetime` comment was not the main thrust of this answer, but for future readers, I'd like to point out that even the "fixed" versions of Python 3 still require `encoding='latin-1'` to unpickle Python 2 datetimes. If your pickled Python 2 data happens to include both datetimes and bytestrings encoded in something other than Latin-1, then you might still be better off using `encoding='bytes'` after all. – John Y Jun 24 '19 at 17:06
19

Using encoding='latin1' causes some issues when your object contains numpy arrays in it.

Using encoding='bytes' will be better.

Please see this answer for complete explanation of using encoding='bytes'

djvg
  • 11,722
  • 5
  • 72
  • 103
Sreeragh A R
  • 2,871
  • 3
  • 27
  • 54
  • Which issues? What should I be careful of? using `bytes` makes strings into bytes(), so I prefer `latin1` if possible, but it is not clear to me what the problem is. – Gulzar Jan 01 '20 at 14:25
  • 2
    @sreeragh-a-r: Could you give an example of the issues you encountered? I have a two-dimensional `numpy.ndarray` (numpy 1.14) pickled in Python 2.7 using `cPickle.dumps()`, and unpickling in Python 3 with `pickle.loads(..., encoding='latin1')` works fine. – djvg Jan 17 '20 at 17:04
  • @djvg I faced issues when I had to pickle images as image string and unpickle them. The code can be found here. https://gist.github.com/sreeragh-ar/70205db3a43badbfa69f758faa898be3 – Sreeragh A R Jan 17 '20 at 18:06
  • @Gulzar Please see the above gist for the problem. Images were getting corrupted after unpickling. – Sreeragh A R Jan 17 '20 at 18:10
  • if you are NOT using `np.arrays` save yourself some hassle and keep `encoding='latin1'` so you don't have to decode all `bytes` to `str` – jboxxx May 20 '21 at 19:42
  • @jboxxx or if your strings is only ascii characters, otherwise you'll have to use bytes. – Javed Jan 25 '23 at 13:50