2

I'm trying to pickle a pandas dataframe to my local directory so I can work on it in another jupyter notebook. The write appears to go successful at first but when trying to read it in a new jupyter notebook the read is unsuccessful.

When I open the pickle file I appear to have wrote, the file's only contents are:

Error! /Users/.../income.pickle is not UTF-8 encoded Saving disabled. See console for more details.

I also checked and the pickle file itself is only a few kilobytes.

Here's my code for writing the pickle:


with open('income.pickle', 'wb', encoding='UTF-8') as to_write:
    pickle.dump(new_income_df, to_write)

And here's my code for reading it:


with open('income.pickle', 'rb') as read_file:
    income_df = pickle.load(read_file)

Also when I return income_df I get this output:

Series([], dtype: float64)

It's an empty series that I errors on when trying to call most series methods on it.

If anyone knows a fix for this I'm all ears. Thanks in advance!

EDIT:

This is the solution I arrived at:

with open('cleaned_df', 'wb') as to_write:
    pickle.dump(df, to_write)

with open('cleaned_df','rb') as read_file:
    df = pickle.load(read_file)

Which was much simpler than I expected

Joel Porcaro
  • 69
  • 1
  • 1
  • 7
  • 1
    For future searchers, it can happen that the object was written successfully even though Jupyter Notebook can't display the pickle file contents directly and displays 'Error! is not UTF-8 encoded Saving disabled. See Console for more details.' Try unpickling the object and inspecting it. You may find the object intact. The original poster was unlucky in that respect, but that is a separate issue. – Kaleb Coberly Feb 11 '21 at 23:22

2 Answers2

0

Pickling is generally used to store raw data, not to pass a Pandas DataFrame object. When you try to pickle it, it will just store the top level module name, Series, in this case.

1) You can write only the data from the DataFrame to a csv file.

# Write/read csv file using DataFrame object's "to_csv" method.
import pandas as pd
new_income_df.to_csv("mydata.csv")
new_income_df2 = pd.read_csv("mydata.csv")

2) If your data can be saved as a function in a regular python module with a *.py name, you can call it from a Jupyter notebook. You can also reload the function after you have changed the values inside. See autoreload ipynb documentation: https://ipython.org/ipython-doc/3/config/extensions/autoreload.html

# Saved as "mymodule1.py" (from notebook1.ipynb).
import pandas as pd
def funcdata():
    new_income_df = pd.DataFrame(data=[100, 101])
    return new_income_df

# notebook2.ipynb
%load_ext autoreload
%autoreload 2
import pandas as pd
import mymodule1.py
df2 = mymodule1.funcdata()
print(df2)
# Change data inside fucdata() in mymodule1.py and see if it changes here.

3) You can share data between Jupyter notebooks using %store command.
See src : https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/
And: Share data between IPython Notebooks

# %store example, first Jupyter notebook.
from sklearn import datasets
dataset = datasets.load_iris()
%store dataset

# from a new Jupyter notebook read.
%store -r dataset
Jennifer Yoon
  • 791
  • 5
  • 10
  • Thank you very much! I was able to work around my issue using the examples you gave. – Joel Porcaro Jul 14 '19 at 00:56
  • Hi Joel, Great! I am a newbie programmer myself. Curious to know what was your solution? You can edit your Question post to show what worked for you. – Jennifer Yoon Jul 14 '19 at 01:04
  • The solution was pretty simple actually. I edited my post above for you to see it. – Joel Porcaro Jul 26 '19 at 16:57
  • Interesting. It seems you were able to reformat a dataframe object to save it as a pickle binary object. ('cleaned_df', 'wb') I will have to try that out. Happy coding. :-) – Jennifer Yoon Aug 01 '19 at 18:27
0

Use this:

movies = pd.read_pickle('cleaned_df')

to load the pickled file.

Alez
  • 1,913
  • 3
  • 18
  • 22