Context
I work with python 3.9.6 and pandas 1.3.0.
My colleague works with python 3.6.12 and pandas 1.1.5.
I want to create a dataframe and share it with my colleague, without asking them to update their environment (that request would incur some hassle).
Question
How can I write out a dataframe to a file using my newer python/pandas versions in a way that their older python/pandas versions can read it in as a dataframe?
What I've tried or looked into
Default .to_pickle()
method
If in the newer python environment I write:
df.to_pickle(r"C:\somepath\file.bz2")
and in the older python environment I try:
df.read_pickle(r"C:\somepath\file.bz2")
I get:
ValueError: unsupported pickle protocol: 5
Specifying a protocol version in the .to_pickle()
method
Fine, I thought, I'll specify a different protocol.
df.to_pickle(r"C:\somepath\file.bz2", protocol=3)
However, if in the older python environment I try to load it I get
AttributeError: module 'pandas.core.internals.blocks' has no attribute 'new_block'
This error remains for all protocol versions from 0 to 5.
Previous question on protocol version
I found this question, which only has the answer that the pandas versions must match.
I find it hard to believe that's the only solution, as then, what's the point of having multiple pickle protocols which are meant to be backward compatible?
Previous question on the new_block
attribute
This question mentions the same error with the missing new_block
attribute. Again, the answer is to update the pandas version (over which I have no control at the moment).
Downgrading the newer python/pandas versions
I could downgrade my newer python/pandas to match my colleague's versions.
Haven't tried it yet, but I assume that should work. However, that would really be a last resort, as then I would need a special "low version" environment to work with this one colleague.
Exporting to CSV
This works, but it loses some dataframe specific features like data types and NaN
values, so I don't consider this a valid workaround.
Pickling separately
I thought maybe the issue lies in the pandas .to_pickle()
or .read_pickle()
method, so I tried using the pickle
library directly to write the file (using protocol 3):
import pickle
with open('file.pkl', 'wb') as f:
pickle.dump(df, f, 3)
... and then read it in the older python environment:
import pickle
with open('file.pkl', 'rb') as f:
df = pickle.load(f)
Unfortunately, I am still met with
AttributeError: module 'pandas.core.internals.blocks' has no attribute 'new_block'
Converting to a dict, then pickling that
Per the suggestion in the comments I tried:
ddf = df.to_dict()
with open('file.pkl', 'wb') as f:
pickle.dump(ddf, f, 3)
But then, when I try to read it in the older environment, I get:
AttributeError: Can't get attribute '_unpickle_timestamp' on <module 'pandas._libs.tslibs.timestamps
My DataFrame has a timestamp column in it, which apparently cannot be unpickled by the older pandas version.