0

I am quite new to python and I am looking at an AI code and I need to read a file containing the training data. The code provided for this part looks like this:

import _pickle as cPickle
import gzip
import numpy as np

f = gzip.open('../data/mnist.pkl.gz', 'rb')
training_data, validation_data, test_data = cPickle.load(f)
f.close()

However I get this error:

Traceback (most recent call last):
  File "test.py", line 6, in <module>
    training_data, validation_data, test_data = cPickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)

Can someone tell me why and how to fix it? Or what else should I use to read it? (The code I am using is from a quite reliable source so it should work out of the box...)

  • 2
    Can you please include the *full traceback* of the error, and not just the last line? – Martijn Pieters Jun 22 '18 at 22:31
  • Sorry! I fixed it. – Kelly Jones Jun 22 '18 at 22:45
  • Possible duplicate of [UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)](https://stackoverflow.com/questions/9942594/unicodeencodeerror-ascii-codec-cant-encode-character-u-xa0-in-position-20) – jman005 Jun 22 '18 at 22:49
  • Try setting the encoding to `utf8` when opening the file: `f = gzip.open('../data/mnist.pkl.gz', 'rb', encoding='utf8')`. – Abdou Jun 22 '18 at 22:51
  • @Abdou I tried that, as I saw it in a similar post here, but I get this error: ValueError: Argument 'encoding' not supported in binary mode – Kelly Jones Jun 22 '18 at 23:03
  • Don't open it in binary mode. Pickle files are created in text mode by default. Try opening the file with `'r'` instead of `'rb'`. – blhsing Jun 22 '18 at 23:23
  • @blhsing This is [`gzip.open`](https://docs.python.org/3/library/gzip.html#gzip.open), so `'r'` means the same thing as `'rb'`. – abarnert Jun 22 '18 at 23:36
  • @abarnert Right thanks. I meant to say `'rt'`, not `'r'`. – blhsing Jun 22 '18 at 23:39
  • @blhsing Also, `pickle` defaults to the binary protocol 3, not the text protocol 0, in Python 3. (And this is clearly Python 3 code, although oddly disguised to look like Python 2 code for some reason—I have no idea why anyone would `import _pickle as cPickle`…). And protocol 0 is designed to be read as bytes anyway. (There was no Unicode support when it was invented in Python 1.something…) – abarnert Jun 22 '18 at 23:39
  • @blhsing I tried f = gzip.open('../data/mnist.pkl.gz', 'rt', encoding='utf8') but I get this UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte – Kelly Jones Jun 22 '18 at 23:45
  • The first question is: why are you using `_pickle` instead of just using `pickle`? Does the problem just go away if you change that? Second, where did this pickle file come from? If it was generated by Python 2.x, you probably need to pass an explicit encoding—but the place to pass that is in `pickle.load`, not in opening the file. If it was generated by Python 3.0-3.2, there's a different problem that I forget the workaround for (but could probably find). – abarnert Jun 22 '18 at 23:45
  • If it was generated by 3.3+, the encoding is probably just a red herring, and it's trying to read non-strings as strings, and you'll need to use `pickletools` to debug it. – abarnert Jun 22 '18 at 23:45
  • @jman005 It's definitely not a duplicate of that question. That one's about using `str` instead of `decode` in Python 2.x. This one is about reading a gzipped pickle file in Python 3.x. The fact that the error string happens to be the same doesn't mean that it's the same problem. – abarnert Jun 22 '18 at 23:47
  • @KellyJones Try `encoding='utf-16'` then. – blhsing Jun 22 '18 at 23:47
  • I found a newer version of the code that uses import pickle and training_data, validation_data, test_data = pickle.load(f, encoding="latin1") and now it is working. Thank you for help guys! – Kelly Jones Jun 23 '18 at 08:56

0 Answers0