I'm new to machine learniing, and i tried to avoid downloading the mnist dataset from the openml module, everytime i needed to work on the dataset.i saw this code online that helped me convert the idx file into python arrays,but i have an issue with my train_set labels which keeps coming up short of 8 values, i believe it has to do with the way i converted it.
import numpy as np
import struct
with open('train-images.idx3-ubyte', 'rb') as f:
magic, size = struct.unpack('>II', f.read(8))
nrows, ncols = struct.unpack('>II', f.read(8))
data = np.fromfile(f, dtype=np.dtype(np.uint8)).newbyteorder(">")
data = data.reshape((size,nrows,ncols))
with open('train-labels.idx1-ubyte', 'rb') as i:
magic, size = struct.unpack('>II', i.read(8))
nrows, ncols = struct.unpack('>II', i.read(8))
data_1 = np.fromfile(i, dtype=np.dtype(np.uint8)).newbyteorder(">")
x_train, y_train = data, data_1
len(x_train), len(y_train)
>>> (60000,59992)
as shown in the code above, this issue has made my labels become faulty as not all train images would be linked correctly.And I have tried multiple downloads of the file to ensure I didnt acquire a corrupted one.Please, I need help.Thanks