13

Python changed its pickle protocol to 4 in python 3.4 to 3.7 and again changed it to protocol=5 in python 3.8. How do I open older pickled files in python 3.8?

I tried:

>>> with open('data_frame_111.pkl','rb') as pfile:
...     x1 = pickle.load(pfile)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
AttributeError: Can't get attribute 'new_block' on <module 
'pandas.core.internals.blocks' from '/opt/anaconda3/lib/python3.8/site- 
packages/pandas/core/internals/blocks.py'>

and

>>> with open('data_frame_111.pkl','rb') as pfile:
...     x1 = unpkl.load(pfile, protocol=4)

but whereas protocol is a keyword in pickle.dump it is not part of pickle.load. Instantiating pickle.Unpickler() also doesn't work. But obviously there should be a way.

In python 3.7, I would import pickle5 and use that to open newer pickles, but can't find documentation on doing the reverse in python 3.8.

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
Marc Maxmeister
  • 4,191
  • 4
  • 40
  • 54
  • 1
    I don't think the problem is the pickle version, rather, it's the `pandas` version. `pickle.load` doesn't have a `protocol` argument because the protocol is automatically detected – juanpa.arrivillaga Jul 16 '21 at 20:01
  • So it is just not possible to load a pickled pandas dataframe using `pickle` protocol=4 in python3.8? Seems like there has to be a way. I am not invoking pandas here, though the object is a data frame. – Marc Maxmeister Jul 16 '21 at 20:07
  • I'll try upgrading my pandas version to the latest version to see if that resolves this. Or, is it that the newer pandas version in the 3.8 environment doesn't recognize the structure of an older pandas version? – Marc Maxmeister Jul 16 '21 at 20:08
  • Appears to be caused by pandas between version 0.23 and 0.24. https://github.com/Kaggle/docker-python/issues/519. I am trying to load using pandas v1.2.4. – Marc Maxmeister Jul 16 '21 at 20:17
  • 3
    `pickle` works by loading modules and then reconstructing class objects based on the pickled data. The error says that your pandas doesn't have a needed function pandas.core.internals.blocks,new_block. If you get the pandas version on both machines, you likely find that the source has a newer pandas than the destination. The fix is to update pandas. – tdelaney Jul 16 '21 at 20:18
  • This is a risk with pickle - when pickling complex objects, you are dependent on the unpickler code being close enough to the source that the same methods written in the pickle file are still there. If you want to handle a larger range of pandas versions, stick to one of the standard file formats like csv, parquet, etc. – tdelaney Jul 16 '21 at 20:21
  • The issue almost certainly has nothing to do with pickle version, as I stated, rather, the pandas version. And of course you are using pandas, *how else would you create a pandas dataframe*? – juanpa.arrivillaga Jul 16 '21 at 20:38

2 Answers2

9

You need to upgrade to the latest version (1.3.1 worked for me) of pandas. Or, to be more precise, the pandas version when you did pickle.dump(some_path) should be the same pandas version as when you will do pickle.load(some_path).

kimonili
  • 116
  • 4
  • 3
    This is probably the right answer, but it poses a serious design flaw. When you use pickle to store data over months or years, it is not stable as a format if the methods to retrieve the data depend on the version of the code used at the time. – Marc Maxmeister Jul 29 '21 at 21:09
  • Yes, I agree. Pickle files are not a good choice for long-term storing. – kimonili Jul 30 '21 at 05:40
  • 1
    Got this error using colab. Fix was to `!pip install --upgrade pandas`. – BSalita Aug 24 '21 at 16:04
  • There's a longer explanation of both the python 3.7 to 3.8 pickle 4-to-5 issues, and the pandas 1.2.x to 1.3.x incompatibility issue with pickles here: https://stackoverflow.com/a/68939962/536538 --- I am still looking for a fix. This is a serious breaking change in pickling. – Marc Maxmeister Nov 08 '21 at 20:51
0
with open('data_frame_111.pkl','rb') as pfile:
    x1 = pickle.load(pfile)

Try changing to:

import pandas as pd
with open('data_frame_111.pkl','rb') as pfile:
    x1 = pd.read_pickle(pfile)

Looks like there have been some changes due to security vulnerability concerns.

  • I tried both approaches, and neither one works, because as the other answer states, the pandas version used affects whether it works. – Marc Maxmeister Jul 29 '21 at 21:10