2

I have a data analysis pipeline that consists of multiple steps. I have generated a snakemake pipeline (new for me) and the output of every task (and input of the next task) is a pickle file containing either a DataFrame or a list of DataFrames. Everything is fine except I cannot open the pickle files manually. Of note, the pipeline uses a dedicated conda environment.

import _pickle
with open("testb/first/out/stacks.pkl", "rb") as f:
    data = _pickle.load(f)

I get this error:

AttributeError: Can't get attribute '_unpickle_block' on <module 'pandas._libs.internals' from 'C:\\Users\\sebde\\anaconda3\\envs\\dbm\\lib\\site-packages\\pandas\\_libs\\internals.cp39-win_amd64.pyd'

Python 3.10.2, Snakemake-minimal 7.0.4 (as per documentation, I'm on Windows), Pandas 1.4.1

SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46
SebDL
  • 200
  • 1
  • 7
  • The first thing that came to my mind was to check the compatibility between the pandas and the pickle, but I assume that is no issue. The second is a recommendation, maybe you can try convert the pickle to csv that way you can open it with any version of pandas. Unless the dataframe contains objects csv should be fine. – Mikolaj Mar 15 '22 at 16:22
  • 1
    Some of the dataframes contain objects indeed. csv conversion is not applicable in my case. – SebDL Mar 16 '22 at 08:45
  • 1
    Some googling suggests this may be due to different versions of pandas used to write and read the pickle (e.g. https://stackoverflow.com/questions/70944402/unable-to-open-spydata-in-spyder-cant-get-attribute-unpickle-block-on) - could that be the case for you? Check the version of pandas inside and outside the conda env. – dariober Mar 16 '22 at 08:56
  • Items beginning with underscore are internals and not meant to be accessed. Maybe you switched to that when using Pandas `read_pickle` wasn't working? If you tried other things then they should be in the post. Did you use Pandas to pickle your objects? As others point out, use the same Pandas version to unpickle. Is this specific to snakemake being involved? – Wayne Mar 21 '22 at 15:09

1 Answers1

0

One easy thing to try (but which might not work) is to use pickle rather than _pickle:

# the "as" part is to avoid adjust downstream code
# but if this is not a concern a regular import is better
# (i.e. "import pickle")
import pickle as _pickle
SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46