19

I have been experimenting with a Keras example, which needs to import MNIST data

from keras.datasets import mnist
import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()

It generates error messages such as Exception: URL fetch failure on https://s3.amazonaws.com/img-datasets/mnist.pkl.gz: None -- [Errno 110] Connection timed out

It should be related to the network environment I am using. Is there any function or code that can let me directly import the MNIST data set that has been manually downloaded?

I tried the following approach

import sys
import pickle
import gzip
f = gzip.open('/data/mnist.pkl.gz', 'rb')
  if sys.version_info < (3,):
    data = pickle.load(f)
else:
    data = pickle.load(f, encoding='bytes')
f.close()
import numpy as np
(x_train, _), (x_test, _) = data

Then I get the following error message

Traceback (most recent call last):
File "test.py", line 45, in <module>
(x_train, _), (x_test, _) = data
ValueError: too many values to unpack (expected 2)
nbro
  • 15,395
  • 32
  • 113
  • 196
user785099
  • 5,323
  • 10
  • 44
  • 62

6 Answers6

13

Well, the keras.datasets.mnist file is really short. You can manually simulate the same action, that is:

  1. Download a dataset from https://s3.amazonaws.com/img-datasets/mnist.pkl.gz
  2. .

    import gzip
    f = gzip.open('mnist.pkl.gz', 'rb')
    if sys.version_info < (3,):
        data = cPickle.load(f)
    else:
        data = cPickle.load(f, encoding='bytes')
    f.close()
    (x_train, _), (x_test, _) = data
    
sygi
  • 4,557
  • 2
  • 32
  • 54
  • Hi sygi, thanks for the suggestion. However, I got error message as shown in the updated post. The only thing being different with yours is that I use pickle. Looks like it did not give me error during loading the data. – user785099 Nov 19 '16 at 20:13
  • 1
    I have checked and it works on my system, with both pickle and cPickle and both python 2 and 3. Are you sure you have the same file (md5 b39289ebd4f8755817b1352c8488b486)? – sygi Nov 19 '16 at 20:27
  • It works, do not know why it had error message previously. Thanks a lot. – user785099 Nov 20 '16 at 05:31
  • In my case it worked adding those imports `import sys; import pickle; import gzip;` and using `pickle` instead of `cPickle` – I'm using Python 3.6.7 on macOs Mojave – Giorgio Tempesta Aug 05 '19 at 20:08
12

Keras file is located into a new path in Google Cloud Storage (Before it was in AWS S3):

https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz

When using:

tf.keras.datasets.mnist.load_data()

You can pass a path parameter.

load_data() will call get_file() which takes as parameter fname, if path is a full path and file exists, it will not be downloaded.

Example:

# gsutil cp gs://tensorflow/tf-keras-datasets/mnist.npz /tmp/data/mnist.npz
# python3
>>> import tensorflow as tf
>>> path = '/tmp/data/mnist.npz'
>>> (train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data(path)
>>> len(train_images)
>>> 60000
gogasca
  • 9,283
  • 6
  • 80
  • 125
10

You do not need additional code for that but can tell load_data to load a local version in the first place:

  1. You can download the file https://s3.amazonaws.com/img-datasets/mnist.npz from another computer with proper (proxy) access (taken from https://github.com/keras-team/keras/blob/master/keras/datasets/mnist.py),
  2. copy it to the the directory ~/.keras/datasets/ (on Linux and macOS)
  3. and run load_data(path='mnist.npz') with the right file name
tardis
  • 1,280
  • 3
  • 23
  • 48
5
  1. Download file https://s3.amazonaws.com/img-datasets/mnist.npz
  2. Move mnist.npz to .keras/datasets/ directory
  3. Load data

    import keras
    from keras.datasets import mnist
    
    (X_train, y_train), (X_test, y_test) = mnist.load_data()
    
Sundeep1501
  • 1,510
  • 1
  • 18
  • 28
1

keras.datasets.mnist.load_data() will attempt to fetch from the remote repository even when a local file path is specified. However, the easiest workaround to load the downloaded file is to use numpy.load(), just like they do:

path = '/tmp/data/mnist.npz'

import numpy as np

with np.load(path, allow_pickle=True) as f:
    x_train, y_train = f['x_train'], f['y_train']
    x_test, y_test = f['x_test'], f['y_test']
Rakurai
  • 974
  • 7
  • 15
0

Gogasca's answer worked for me with a little adjustment. For Python 3.9, changing the code in ~/Library/Python/3.9/lib/python/site-packages/keras/datasets/mnist.py so that it uses the path variable as full path instead of adding the origin_folder makes it possible to pass any local path to the downloaded file.

  1. Download the file: https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
  2. Put it in ~/Library/Python/3.9/lib/python/site-packages/keras/datasets/, or another location to Your liking.
  3. Alter ~/Library/Python/3.9/lib/python/site-packages/keras/datasets/mnist.py
path = path

""" origin_folder = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/' """
""" path = get_file(
path,origin=origin_folder + 'mnist.npz',file_hash='731c5ac602752760c8e48fbffcf8c3b850d9dc2a2aedcf2cc48468fc17b673d1') """

with np.load(path, allow_pickle=True) as f:  # pylint:
    disable=unexpected-keyword-arg
    x_train, y_train = f['x_train'], f['y_train']
    x_test, y_test = f['x_test'], f['y_test']
return (x_train, y_train), (x_test, y_test)
  1. use the following code to load data:
path = "/Users/username/Library/Python/3.9/lib/python/site-packages/keras/datasets/mnist.npz"
(train_images, train_labels), (test_images, test_labels ) = mnist.load_data(path=path)```
DemanB
  • 47
  • 6