How to understand python pickle files

Question

I've a problem that I am not able to solve. I'm studying a Convolutional Neural Network for traffic sign recognition but I don't understand how python .p files are organized, what these files contain and how to create a .p file to insert my images and labels associated with such images. Can anyone help me? I posted the link of the screenshot about the first lines of code that load the data in the dataset. Thanks a lot.

import pickle
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import random
import cv2
import scipy.misc
from PIL import Image

training_file = 'traffic-signs-data/train.p'
validation_file = 'traffic-signs-data/valid.p'
testing_file = 'traffic-signs-data/test.p'

with open(training_file, mode='rb') as f:
    train = pickle.load(f)
with open(validation_file, mode='rb') as f:
    valid = pickle.load(f)
with open(testing_file, mode='rb') as f:
    test = pickle.load(f)

X_train, y_train = train['features'], train['labels']
X_valid, y_valid = valid['features'], valid['labels']
X_test, y_test = test['features'], test['labels']

Please paste the code in as a code block rather than linking to an external screen-shot — scnerd, Mar 12 '18 at 16:52
Indent it all as four spaces, or just highlight all the code and use the "code block" button at the top of the text editor. I'd do it myself if it weren't so much code to copy by hand. — scnerd, Mar 12 '18 at 16:54
Pickle is supposed to be a completely opaque format; there is no need to understand how it is organised. — Daniel Roseman, Mar 12 '18 at 16:56
Also, this question is too broad. Your code example doesn't actually clarify what you're asking, it simply gives a (seemingly) working example of using `pickle`. What specifically are you trying to do, and not able to do, and what is the code that's causing the problem? — scnerd, Mar 12 '18 at 16:57
I don't understand how images and labels associated with these images are organized in that pickle files. I would to create a .p file with my images and labels but if I don't know to make such a file. I'm trying to print the content of .p file and extract informations form it to understand how images are organized in it but I don't understand. Precisely, I would like to train the network with black and white images, so I would to create a .p files with these images because the neural network uses pickle files to extract images and manipulate them in the form of matrices. — Leonardo Di Domenico, Mar 12 '18 at 17:06

Samuel GIFFARD · Answer 1 · 2018-03-12T17:18:04.440

This may be of interest to you: https://docs.python.org/2/library/pickle.html#what-can-be-pickled-and-unpickled

The thing is, if you do not have the classes that were used to pickle, you won't be able to unpickle.

So, your data in your .p files may be totally useless.

However, if you're the owner of the full flow (and you'll have to create the .p files), then, know that pickle is just a way to serialize/deserialize data.

So, you can have a piece of your software that focuses on populating your .p files (try loading your images with Image (and use pillow and not PIL) and then pickle your list of Images).

You should then be able to unpickle them on the part of your software that you're showing above.

This is just a way to do your preprocessing beforehand and avoid to redo them everytime.

Another way to do that (for example) would be to dump it as json and your images can be base64 encoded/decoded. See here for a quick example of the latter part: https://stackoverflow.com/a/31826470/8933502

Good luck!

score 0 · Answer 2 · answered Mar 12 '18 at 17:10

pickle isn't just one thing, so there's no single answer to your question, but the whole point of pickle is that it shouldn't matter. In general, any python object can be pickled as-is, and unpickled, without any special knowledge of what's being pickled or how. It's simply a way to freeze an in-memory Python object on disk. It's up to you as the developer to know what data and types went into a pickle file and what you should expect back out, or to handle errors appropriately.

There are rare issues with certain types that don't make sense to be pickled (such as an HTTP connection), and also if you attempt to unpickle an old Python object after changing the underlying Python or library versions (e.g., trying to unpickle a Python 3 object in Python 2), but in general it doesn't matter what you're pickling. If you need greater resilience to change, you should use some serialization system that isn't Python- and library-specific.

How to understand python pickle files

2 Answers2