2

I imported fetch_mldata from sklearn.datasets import fetch_mldata and called:

dataset = fetch_mldata('MNIST original')

but what I get is the following:

> Traceback (most recent call last):   File "<stdin>", line 1, in
> <module>   File
> "C:\Users\Jacob\Development\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py",
> line 540, in runfile
>     execfile(filename, namespace)   File "C:/Users/Jacob/Documents/Dropbox/Technion/Semester 8/Machine
> learning/Demo3/Demo3.py", line 75, in <module>
>     dataset = fetch_mldata('MNIST original')    File "C:\Users\Jacob\Development\Anaconda\lib\site-packages\sklearn\datasets\mldata.py",
> line 158, in fetch_mldata
>     matlab_dict = io.loadmat(matlab_file, struct_as_record=True)   File
> "C:\Users\Jacob\Development\Anaconda\lib\site-packages\scipy\io\matlab\mio.py",
> line 126, in loadmat
>     matfile_dict = MR.get_variables(variable_names)   File "C:\Users\Jacob\Development\Anaconda\lib\site-packages\scipy\io\matlab\mio5.py",
> line 288, in get_variables
>     res = self.read_var_array(hdr, process)   File "C:\Users\Jacob\Development\Anaconda\lib\site-packages\scipy\io\matlab\mio5.py",
> line 248, in read_var_array
>     return self._matrix_reader.array_from_header(header, process)   File "mio5_utils.pyx", line 616, in
> scipy.io.matlab.mio5_utils.VarReader5.array_from_header
> (scipy\io\matlab\mio5_utils.c:5903)   File "mio5_utils.pyx", line 645,
> in scipy.io.matlab.mio5_utils.VarReader5.array_from_header
> (scipy\io\matlab\mio5_utils.c:5332)   File "mio5_utils.pyx", line 713,
> in scipy.io.matlab.mio5_utils.VarReader5.read_real_complex
> (scipy\io\matlab\mio5_utils.c:6323)   File "mio5_utils.pyx", line 417,
> in scipy.io.matlab.mio5_utils.VarReader5.read_numeric
> (scipy\io\matlab\mio5_utils.c:3873)   File "mio5_utils.pyx", line 353,
> in scipy.io.matlab.mio5_utils.VarReader5.read_element
> (scipy\io\matlab\mio5_utils.c:3595)   File "streams.pyx", line 324, in
> scipy.io.matlab.streams.FileStream.read_string
> (scipy\io\matlab\streams.c:4343) IOError: could not read bytes

I tried downloading a newer version of sklearn but it didn't help. I so another thread about this problem but the offered solution there didn't help me. How to use datasets.fetch_mldata() in sklearn?

Any ideas?

Community
  • 1
  • 1
Kobi Barac
  • 111
  • 2
  • 4

2 Answers2

4

For your/others' reference, I was getting virtually the same errors (Ubuntu), including that "IOError: could not read bytes" error.

I just posted a solution at

How to use datasets.fetch_mldata() in sklearn?

Short answer - use the following:

from sklearn.datasets.mldata import fetch_mldata
    data = fetch_mldata('mnist-original')

dataset = fetch_mldata('mnist-original', data_home='***')

Replace *** (keep the quotes) with your preferred location (data directory).

Victoria Stuart
  • 4,610
  • 2
  • 44
  • 37
-1

In my case, the root cause was a corrupted mnist-original.mat file. The file was corrupted because I terminated Python before the file was fully downloaded. This left a partially downloaded mnist-original.mat at C:\user\Taimi\scikit_learn_data\mldata.

The solution above worked for me because it simply fetched a new copy in a new location. A more direct solution will be to locate the corrupted mnist-original.mat file, delete it and try running the code again. The running code will download mnist-original.mat again. The complete mnist-original.mat size is 54,142 KB, so if you have a slow connection, fetch_mldata() will take a few minutes to complete.

Taimi
  • 1
  • 2