2

After a correction on the program i am using, i get an error on my code :

import numpy as np
import gzip
import struct


def load_images(filename):
    # Open and unzip the file of images :
    with gzip.open(filename, 'rb') as f:
        # read the header, information into a bunch of variables:
        _ignored, n_images, image_columns, image_rows = struct.unpack('>IIII', bytearray(f.read()[:16]))
        print(_ignored, n_images, image_columns, image_rows)
        print(f.read()[:16])
        # read all the pixels into a long numpy array :
        all_pixels = np.frombuffer(f.read(), dtype=np.uint8)
        print(all_pixels)
        print(all_pixels.shape)
        print(all_pixels.ndim)
        # reshape the array into a matrix where each line is an image:
        images_matrix = all_pixels.reshape(n_images, image_columns * image_rows)

I get this error:

load_images("\\MNIST\\train-images-idx3-ubyte.gz")
2051 60000 28 28
b''
[]
(0,)
1
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "<input>", line 19, in load_images
ValueError: cannot reshape array of size 0 into shape (60000,784)

I tried to defined the array, but still not working....

norok2
  • 25,683
  • 4
  • 73
  • 99
arrayLover3
  • 21
  • 1
  • 4

2 Answers2

0

You read() the file twice. After the first read, the cursor is placed at the bottom. So if you read again ,it does not return anything.

Your object is empty so it is impossible to resize.

For more information click here

xzelda
  • 172
  • 7
0

The problem is that in the line that is supposed to grab the data from the file (all_pixels = np.frombuffer(f.read(), dtype=np.uint8)), the call to f.read() does not read anything, resulting in an empty array, which you cannot reshape, for obvious reasons.

The underlying reason is that file.read() without any argument will read/consume all the bytes from the open file. So by the next file.read() call, you are at the end of the file and nothing is fetched.

Instead, it looks like you would want to read the first 16 bytes as header, and read the rest as data.

To do so, you should replace your first call to .read() with the number of bytes you want to read for the header.

This will ensure that you get read only the first few bytes, leaving the rest to be read by the subsequent f.read() call:

import numpy as np
import gzip
import struct


def load_images(filename):
    # Open and unzip the file of images :
    with gzip.open(filename, 'rb') as f:
        header = f.read(16)  # read the header bytes
        # read the header, information into a bunch of variables:
        _ignored, n_images, image_columns, image_rows = struct.unpack('>IIII', bytearray(header))
        print(_ignored, n_images, image_columns, image_rows)
        print(header)
        # read all the pixels into a long numpy array:
        data = f.read()  # read the data bytes
        all_pixels = np.frombuffer(data, dtype=np.uint8)  
        print(all_pixels)
        print(all_pixels.shape)
        print(all_pixels.ndim)
        # reshape the array into a matrix where each line is an image:
        images_matrix = all_pixels.reshape(n_images, image_columns * image_rows)
norok2
  • 25,683
  • 4
  • 73
  • 99