0

Basically I'm trying to figure out how to store my image data set for machine learning in an effective format. Right now, I figured out how to save my training images, test images and labels as numpy arrays in a .npz file, but I was hoping to figure out how to store the images in the style of the mnist data base. My images are sized (128, 128) and have 3 color channels. Is there a way to convert my image data to a better format and gzip it for compression? Thank you.

This is my current way of storing the 11,500 images I have collected.

import numpy as np

def load_data():
    print("Loading Recycle Data")
    path = 'recycle_data_shuffled.npz'
    recycle_data = np.load(path)
    x_train, y_train = recycle_data['x_train'], recycle_data['y_train']
    x_test, y_test = recycle_data['x_test'], recycle_data['y_test']
    recycle_data.close()

    return (x_train, y_train), (x_test, y_test)
desertnaut
  • 57,590
  • 26
  • 140
  • 166
Anthony
  • 1
  • 1
  • You might be [better off *not* gzipping images](https://webmasters.stackexchange.com/q/8382/103389). – unutbu Sep 10 '19 at 17:51
  • `tar.gz` is not a convenient format for backups or for quick access on a local computer. Extracting one file from a large `tar.gz` [can take about as long as extracting all the files](https://stackoverflow.com/a/26067782/190597) from the `tar.gz`. So unless you want to bundle the images for downloading, there may not be any advantage to using `tar` either. – unutbu Sep 10 '19 at 18:06
  • I am trying to bundle the images for downloading. Is there an easy way I can do this? @unutbu – Anthony Sep 10 '19 at 19:52
  • If one directory contains all 11500 files (and no others), then [this is what you want](https://stackoverflow.com/q/2032403/190597). – unutbu Sep 10 '19 at 21:41

0 Answers0