4

I was looking at this Tensorflow tutorial.

In the tutorial the images are magically read like this:

mnist = learn.datasets.load_dataset("mnist")
train_data = mnist.train.images

My images are placed in two directories:

../input/test/
../input/train/

They all have a *.jpg ending.

So how can read them into my program?

I don't think I can use learn.datasets.load_dataset because this seems to take in a specialized dataset structure, while I only have folders with images.

Monica Heddneck
  • 2,973
  • 10
  • 55
  • 89
bsky
  • 19,326
  • 49
  • 155
  • 270

2 Answers2

6

mnist.train.images is essentially a numpy array of shape [55000, 784]. Where, 55000 is the number of images and 784 is the number of pixels in each image (each image is 28x28)

You need to create a similar numpy array from your data in case you want to run this exact code. So, you'll need to iterate over all your images, read image as a numpy array, flatten it and create a matrix of size [num_examples, image_size]

The following code snippet should do it:

import os
import cv2
import numpy as np
def load_data(img_dir):
    return np.array([cv2.imread(os.path.join(img_dir, img)).flatten() for img in os.listdir(img_dir) if img.endswith(".jpg")])

A more comprehensive code to enable debugging:

import os
list_of_imgs = []
img_dir = "../input/train/"
for img in os.listdir("."):
    img = os.path.join(img_dir, img)
    if not img.endswith(".jpg"):
        continue
    a = cv2.imread(img)
    if a is None:
        print "Unable to read image", img
        continue
    list_of_imgs.append(a.flatten())
train_data = np.array(list_of_imgs)

Note: If your images are not 28x28x1 (B/W images), you will need to change the neural network architecture (defined in cnn_model_fn). The architecture in the tutorial is a toy architecture which only works for simple images like MNIST. Alexnet may be a good place to start for RGB images.

user1523170
  • 393
  • 4
  • 9
  • That returns `AttributeError: 'NoneType' object has no attribute 'flatten'`. For some reason it can't see the images even though I'm sure that I specified the correct folder. – bsky Jul 05 '17 at 19:00
  • Does the folder contain images that are not jpegs? You can try the more comprehensive code above to help debug for which images cv2 is returning None – user1523170 Jul 06 '17 at 05:24
  • Also try the updated one-liner code. My earlier code didn't work if not executed from the same directory as the images. – user1523170 Jul 06 '17 at 05:35
2

You can check the answers given in How do I convert a directory of jpeg images to TFRecords file in tensorflow?. Easiest way is to use the utility provided by tensor flow :build_image_data.py, which does exactly the thing you want to do.

Vijay Mariappan
  • 16,921
  • 3
  • 40
  • 59