Tensorflow CNN training images are all different sizes

Question

I have created a Deep Convolution Neural Network to classify individual pixels in an image. My training data will always be the same size (32x32x7), but my testing data can be any size.

Github Repository

Currently, my model will only work on images that are the same size. I have used the tensorflow mnist tutorial extensively to help me construct my model. In this tutorial, we only use 28x28 images. How would the following mnist model be changed to accept images of any size?

 x = tf.placeholder(tf.float32, shape=[None, 784])
 y_ = tf.placeholder(tf.float32, shape=[None, 10])
 W = tf.Variable(tf.zeros([784,10]))
 b = tf.Variable(tf.zeros([10]))
 x_image = tf.reshape(x, [-1, 28, 28, 1])

To make things a little bit more complicated, my model has transpose convolutions where the output shape needs to be specified. How would I adjust the following line of code so that the transpose convolution will output a shape that is the same size of the input.

  DeConnv1 = tf.nn.conv3d_transpose(layer1, filter = w, output_shape = [1,32,32,7,1], strides = [1,2,2,2,1], padding = 'SAME')

Generally, you should use the same pipeline to get data into your classification system for both training and for inference. How do you generate the 32x32x7 images? Use that same technique to get data into your classification system regardless of your task. — RagingRoosevelt, Jan 03 '18 at 15:45

score 6 · Accepted Answer · answered Dec 31 '17 at 02:00

Unfortunately there's no way to build dynamic graphs in Tensorflow (You could try with fold but that's outside the scope of the question). This leaves you with two options:

Bucketing: You create multiple input tensors in a few hand picked sizes and then in runtime you choose the right bucket (see example). Either way you'll probably need the second option. Seq2seq with bucketing
Resize the input and output images. Assuming the images all maintain the same aspect ration you can try resizing the image before inference. Not sure why you care about the output since MNIST is a classification task.

Either way you can use the same approach:

from PIL import Image

basewidth = 28 # MNIST image width
img = Image.open('your_input_img.jpg')
wpercent = (basewidth/float(img.size[0]))
hsize = int((float(img.size[1])*float(wpercent)))
img = img.resize((basewidth,hsize), Image.ANTIALIAS)

# Save image or feed directly to tensorflow 
img.save('feed_to_tf.jpg')

Forgetting everything I said about the minst data, how would I go about having variable sized input for my task? In the paper, it is mentioned that they use deconvolutions so that any sized input is possible. However, when I use a deconvolution as seen in my question, I need to specify an output size. — Devin Haslam, Jan 12 '18 at 16:18

score 1 · Answer 2 · answered Jan 04 '18 at 10:48

The mnist model code which you mentioned is an example using FC networks and not for convolution networks. The input shape of [None,784] is given for mnist size (28 x 28). The example is a FC network which has fixed input size.

What you are asking for is not possible in FC networks because the number of weights and biases are dependent on the input shape. This is possible if you are using a Fully convolution architecture. So my suggestion is to use a fully convolution architecture so that the weights and biases are not dependent on the input shape

score 1 · Answer 3 · answered Jan 05 '18 at 22:34

Adding to @gidim's answer, here is how you can resize the images in Tensorflow, and feed the results directly to your inference. Note: This method scales and distorts the image, which might increase your loss.

All credit goes to Prasad Pai's article on Data Augmentation.

import tensorflow as tf
import numpy as np
from PIL import Image

IMAGE_SIZE = 32
CHANNELS = 1

def tf_resize_images(X_img_file_paths):
    X_data = []
    tf.reset_default_graph()
    X = tf.placeholder(tf.float32, (None, None, CHANNELS))
    tf_img = tf.image.resize_images(X, (IMAGE_SIZE, IMAGE_SIZE), 
                                    tf.image.ResizeMethod.NEAREST_NEIGHBOR)
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        # Each image is resized individually as different image may be of different size.
        for index, file_path in enumerate(X_img_file_paths):
            img = Image.open(file_path)
            resized_img = sess.run(tf_img, feed_dict = {X: img})
            X_data.append(resized_img)

    X_data = np.array(X_data, dtype = np.float32) # Convert to numpy
    return X_data

How would you load the data with TF to then use resize? As Pillow decode is not numerically accurate — Echo9k, Feb 05 '22 at 18:49

Tensorflow CNN training images are all different sizes

3 Answers3