1

I've been following the fastai course on machine learning. Got up to lesson four and thought I'd use what I've learned to create a model that predicts hand-written letters. The code they used to load their training dataset is as follows:

pets1 = DataBlock(blocks = (ImageBlock, CategoryBlock),
                 get_items=get_image_files, 
                 splitter=RandomSplitter(seed=42),
                 get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'))
pets1.summary(path/"images")

This works when you have image files but the dataset files I have are

emnist-letters-train-images-idx3-ubyte
emnist-letters-train-labels-idx1-ubyte
emnist-letters-test-images-idx3-ubyte
emnist-letters-test-labels-idx1-ubyte

I could extract all the images from those files but is there a way I can load the ubyte files into my program? The files have the same format as the MNIST digits dataset.

オパラ
  • 317
  • 2
  • 10
  • Not a full answer, but this question has python code, using a specialized library, for reading the images and labels: https://stackoverflow.com/questions/40427435/extract-images-from-idx3-ubyte-file-or-gzip-via-python You should be able to use that in your get_x and get_y functions. Might just be easier to extract them all first, though. – Bleyddyn Feb 03 '22 at 16:01
  • @Bleyddyn thank you for the answer but I've decided to learn to use pytorch instead of fastai. – オパラ Feb 04 '22 at 14:23

0 Answers0