TensorFlow: Train model on a custom image dataset

Question

I am interested in training and evaluating a convolutional neural net model on my own set of images. I want to use the tf.layers module for my model definition, along with a tf.learn.Estimator object to train and evaluate the model using the fit() and evaluate() methods, respectively.

Here is the tutorial that I have been following, which is helpful for showcasing the tf.layers module and the tf.learn.Estimator class. However, the dataset that it uses (MNIST) is simply imported and loaded (as NumPy arrays). See the following main function from the tutorial script:

def main(unused_argv):
  # Load training and eval data
  mnist = learn.datasets.load_dataset("mnist")
  train_data = mnist.train.images  # Returns np.array
  train_labels = np.asarray(mnist.train.labels, dtype=np.int32)
  eval_data = mnist.test.images  # Returns np.array
  eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)

  # Create the Estimator
  mnist_classifier = learn.Estimator(
      model_fn=cnn_model_fn, model_dir="/tmp/mnist_convnet_model")

  # Set up logging for predictions
  # Log the values in the "Softmax" tensor with label "probabilities"
  tensors_to_log = {"probabilities": "softmax_tensor"}
  logging_hook = tf.train.LoggingTensorHook(
      tensors=tensors_to_log, every_n_iter=50)

  # Train the model
  mnist_classifier.fit(
      x=train_data,
      y=train_labels,
      batch_size=100,
      steps=20000,
      monitors=[logging_hook])

  # Configure the accuracy metric for evaluation
  metrics = {
      "accuracy":
          learn.MetricSpec(
              metric_fn=tf.metrics.accuracy, prediction_key="classes"),
  }

  # Evaluate the model and print results
  eval_results = mnist_classifier.evaluate(
      x=eval_data, y=eval_labels, metrics=metrics)
  print(eval_results)

Full code here

I have my own images, which I have in both jpg format within a certain directory structure:

data
    train
        classA
            1.jpg
            2.jpg
            ...
        classB
            3.jpg
            4.jpg
            ...
        ...
    validate
        classA
            5.jpg
            6.jpg
            ...
        classB
            ...
        ...

And I have also converted my image directories into TFRecord format, with one TFRecord file for train and one for validation. I followed this tutorial, which uses the build_image_data.py script from the Inception model that comes with TensorFlow as a blackbox that outputs these TFRecord files. I admit that I may have put the cart before the horse a bit by creating these, but I thought that perhaps there was a way to use these as inputs to the tf.learn.Estimator's fit() and evaluate() methods.

Questions

How can I format my jpg (or TFRecord) data so that I can use them as inputs to the Estimator object's functions?

I'm assuming I have to convert my images and labels to NumPy arrays, as it shows in the code above, however, it is not clear how the mnist.train.images and mnist.train.validation are formatted.

Does anyone have any experience with converting jpg files and labels to NumPy arrays that this Estimator class expects as inputs?

Any help would be greatly appreciated.

I know the question is for Tensorflow (and I'm trying to find how to do this in Tensorflow), but this is super easy to do in PyTorch: https://github.com/pytorch/vision#imagefolder — finbarr, Jul 07 '17 at 15:38
This answer might be useful for you: https://stackoverflow.com/questions/34340489/tensorflow-read-images-with-labels — finbarr, Jul 07 '17 at 16:02

score 2 · Accepted Answer · edited Sep 12 '17 at 18:47

The file that you have referenced, cnn_mnist.py, and specifically the following function mnist_classifier.fit, requires Numpy arrays as input for x and y. Therefore, I will address your second and third questions as TFRecords may not be easily incorporated into the referenced code.

however, it is not clear how the mnist.train.images and mnist.train.validation are formatted

mnist.train.images is a Numpy array with shape (55000, 784), where 55000 is the number of images and 784 is the dimension of each flattened image (28 x 28). mnist.validation.images is also a Numpy array with shape (5000, 784).

Does anyone have any experience with converting jpg files and labels to NumPy arrays that this Estimator class expects as inputs?

The following code reads in one JPEG image as a three-dimensional Numpy array:

    from scipy.misc import imread
    filename = '1.jpg'
    np_1 = imread(filename)

I assume all of these images are the same size or that you are able to resize them to the same size, considering that you have already generated TFRecords files from this dataset. All that is left to do is flatten the image, read in the other images iteratively and flatten them, and then vertically stack all the images. This object can be fed into the Estimator function.

Below is code to flatten and vertically stack two three-dimensional Numpy arrays:

    import numpy as np
    np_1_2 = np.vstack((np_1.flatten(), np_2.flatten()))

TensorFlow: Train model on a custom image dataset

1 Answers1