3

I am using the practising code of mnist data for deep learning in Python 3.4

The original code is

import _pickle as cPickle
def load_data():
    f = gzip.open('../data/mnist.pkl.gz', 'rb')
    training_data, validation_data, test_data = cPickle.load(f)
    f.close()
    return (training_data, validation_data, test_data)
def load_data_wrapper():
    tr_d, va_d, te_d = load_data()
    ....

However, it causes the UnicodeDecodeError, according to the suggestions on the Internet, I change it cPickle.load(f) to pickle.load(f, encoding='latin1')

And the same error occurs when I run in the shell

>>> training_data, validation_data, test_data = \
... mnist_loader.load_data_wrapper() \
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\E\Deep Learning Tutorial\neural-networks-and-deep-learning-master\src\mnist_loader.py", line 68, in load_data_wrapper
tr_d, va_d, te_d = load_data()
  File "C:\E\Deep Learning Tutorial\neural-networks-and-deep-learning-master\src\mnist_loader.py", line 43, in load_data

And the error line traces back to:

f = gzip.open('../data/mnist.pkl.gz', 'rb')

With the same error as before, but only occurs in different line

UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)

How to fix this problem?

martineau
  • 119,623
  • 25
  • 170
  • 301
Litchy
  • 355
  • 1
  • 4
  • 18
  • Have you tried putting the line ` # -*- coding: utf-8 -*-` at the very top pf your file – coder3521 Dec 04 '17 at 07:33
  • @csharpcoder I have tried your code. After adding, the error is the same, but the error line becomes `f.close()`, it is really strange – Litchy Dec 04 '17 at 07:49
  • @csharpcoder Also, if I put a comment line `#training_data, validation_data, test_data = cPickle.load(f)`, then the error line would becomes `#training_data, validation_data, test_data = cPickle.load(f)` – Litchy Dec 04 '17 at 07:51
  • Try using `pickle.load(f, encoding='bytes')`. It tells pickle how to decode 8-bit string instances pickled by Python 2 and defaults to `‘ASCII’`. – martineau Dec 04 '17 at 08:23
  • @martineau I tried your code and still the same problem, and the error line is `f.close()`, which is strange. This line can never be wrong, it seems the line above it would always be wrong. For example, now the wrong line is `f.close()`, if I add a comment line `#...` above `f.close()`, then the comment line would be the error line. – Litchy Dec 04 '17 at 08:49
  • Hard to understand how the `f.close()` could cause an `UnicodeDecodeError`, If you upload the `mnist.pkl.gz` file somewhere and provide a link to it so I can download it for testing purposes, I'll see if I can recreate the problem and find a solution for you. – martineau Dec 04 '17 at 19:11
  • 1
    @martineau The link is https://github.com/mnielsen/neural-networks-and-deep-learning/archive/master.zip, thank you very much for your kindness! – Litchy Dec 05 '17 at 06:45

1 Answers1

2

First of all, I was able to reproduce the problem using the mnist.pkl.gz data file extracted from the https://github.com/mnielsen/neural-networks-and-deep-learning/archive/master.zip archive I downloaded. The following exception is raised from the pickle.load(f) call:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)

However the error went away when I added the encoding='bytes' argument to the pickle.load() call as I suggested in a comment under your question.

Another change was to replace the import _pickle as cPickle with just import pickle, however I don't think that's significant (see What difference between pickle and _pickle in python 3?).

Other differences that might be significant, however, are the fact that I'm using Python 3.6.3 on Windows.

import gzip
import pickle

def load_data():
    f = gzip.open('mnist.pkl.gz', 'rb')
    training_data, validation_data, test_data = \
        pickle.load(f, encoding='bytes')  # Note encoding argument value.
    f.close()
    return (training_data, validation_data, test_data)

def load_data_wrapper():
    tr_d, va_d, te_d = load_data()
    print('gzipped pickled data loaded successfully')

load_data_wrapper()

A digression: The load_data() function could be written a little more succinctly like this:

def load_data():
    with gzip.open('mnist.pkl.gz', 'rb') as f:
        training_data, validation_data, test_data = \
            pickle.load(f, encoding='bytes')
    return training_data, validation_data, test_data
martineau
  • 119,623
  • 25
  • 170
  • 301
  • I think I get the reason why the first try failed. Every time the code is changed, I have to terminate the python shell and rerun or the changed code would not work. Previously, I did not terminate the shell. And your comment is the correct answer, thank you! – Litchy Dec 06 '17 at 01:53
  • Adding encoding='bytes' as 2nd argument for load(..) resolved the UnicodeDecodeError for me in Python 3.7 – KGhatak Feb 06 '19 at 06:04