2

I'm currently playing around with some neural networks in TensorFlow - I decided to try working with the CIFAR-10 dataset. I downloaded the "CIFAR-10 python" dataset from the website: https://www.cs.toronto.edu/~kriz/cifar.html.

In Python, I also tried directly copying the code that is provided to load the data:

def unpickle(file):
import pickle
with open(file, 'rb') as fo:
    dict = pickle.load(fo, encoding='bytes')
return dict

However, when I run this, I end up with the following error: _pickle.UnpicklingError: invalid load key, '\x1f'. I've also tried opening the file using the gzip module (with gzip.open(file, 'rb') as fo:), but this didn't work either.

Is the dataset simply bad, or this an issue with code? If the dataset's bad, where can I obtain the proper dataset for CIFAR-10?

MLavrentyev
  • 1,827
  • 2
  • 24
  • 32
  • Try removing the `encoding='bytes'`? – cs95 Jul 15 '17 at 18:54
  • I tried that, and the same error persisted. – MLavrentyev Jul 15 '17 at 18:56
  • Okay... do you have keras? – cs95 Jul 15 '17 at 19:01
  • I installed tensorflow through pip, so `pip install tensorflow`. Not sure if that'd also install keras, but I'm assuming no. – MLavrentyev Jul 15 '17 at 19:02
  • This may help then: from keras.datasets import cifar10 – cs95 Jul 15 '17 at 19:04
  • I'll take a look at that. It just piques me why the "official" dataset isn't working, with the code and data that's provided on the website – MLavrentyev Jul 15 '17 at 19:06
  • I'm surprised too. It should work fine. That code is probably dated. There's something more that needs to be done that I don't know. – cs95 Jul 15 '17 at 19:10
  • 1
    I don't know if this has been resolved yet, but I downloaded the python dataset and pickle works with that dataset. I believe that the dataset that is being used in the tensorflow example is the binary dataset and can't be unpickled. – marqs Oct 13 '17 at 19:09

6 Answers6

2

Extract your *.gz file and use this code

from six.moves import cPickle
f = open("path/data_batch_1", 'rb')
datadict = cPickle.load(f,encoding='latin1')
f.close()
X = datadict["data"]
Y = datadict['labels']
Mehralian
  • 21
  • 2
1

Just extract your tar.gz file, you will get a folder of data_batch_1, data_batch_2, ...

After that just use, the code provided to load data into your project :

def unpickle(file):
    import pickle
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
    return dict

dict = unpickle('data_batch_1')

0

It seems like that you need to unzip *gz file and then unzip *tar file to get a folder of data_batches. Afterwards you could apply pickle.load() on these batches.

Peter Guan
  • 1
  • 1
  • 1
0

I was facing the same problem using jupyter(vscode) and python3.8/3.7. I have tried to edit the source cifar.py cifar10.py but without success.
the solution for me was run these two lines of code in separate normal .py file:

from tensorflow.keras.datasets import cifar10
cifar10.load_data()

after that it worked fine on Jupyter.

Ahmad Asmndr
  • 187
  • 1
  • 7
0

Try this:

import pickle
import _pickle as cPickle
import gzip

with gzip.open(path_of_your_cpickle_file, 'rb') as f:
    var = cPickle.load(f)
0

Try in this way

import pickle
import gzip
 with gzip.open(path, "rb") as f:
    loaded = pickle.load(f, encoding='bytes')

it works for me