8

I am using a simple object detection model in PyTorch and using a Pytoch Model for Inferencing.

When I am using a simple iterator over the code

for k, image_path in enumerate(image_list):
    image = imgproc.loadImage(image_path)
    print(image.shape)
    with torch.no_grad():
        y, feature = net(x)        
    result = image.cuda()

It prints our variable sized images such as

torch.Size([1, 3, 384, 320])

torch.Size([1, 3, 704, 1024])

torch.Size([1, 3, 1280, 1280])

So When I am using Batch Inferencing using a DataLoader applying the same transformation the code is not running. However, when I am resizing all the images as 600.600 the batch processing runs successfully.

I am having Two Doubts,

First why Pytorch is capable of inputting dynamically sized inputs in Deep Learning Model and Why dynamic sized input is failing in Batch Processing.

bigbounty
  • 16,526
  • 5
  • 37
  • 65
Abhik Sarkar
  • 901
  • 3
  • 12
  • 32

3 Answers3

11

PyTorch has what is called a Dynamic Computational Graph (other explanation).

It allows the graph of the neural network to dynamically adapt to its input size, from one input to the next, during training or inference. This is what you observe in your first example: providing an image as a Tensor of size [1, 3, 384, 320] to your model, then another one as a Tensor of size [1, 3, 384, 1024], and so forth, is completely fine, as, for each input, your model will dynamically adapt.

However, if your input is a actually a collection of inputs (a batch), it is another story. A batch, for PyTorch, will be transformed to a single Tensor input with one extra dimension. For example, if you provide a list of n images, each of the size [1, 3, 384, 320], PyTorch will stack them, so that your model has a single Tensor input, of the shape [n, 1, 3, 384, 320].

This "stacking" can only happen between images of the same shape. To provide a more "intuitive" explanation than previous answers, this stacking operation cannot be done between images of different shapes, because the network cannot "guess" how the different images should "align" with one another in a batch, if they are not all the same size.

No matter if it happens during training or testing, if you create a batch out of images of varying size, PyTorch will refuse your input.

Several solutions are usually in use: reshaping as you did, adding padding (often small or null values on the border of your images) to extend your smaller images to the size of the biggest one, and so forth.

Clef.
  • 487
  • 3
  • 14
  • Can I collate it according to size? – Abhik Sarkar Jul 09 '20 at 14:38
  • 1
    If you mean creating batches of items of the same size (one batch for items of [1, 3, 384, 320], one for items of [1, 3, 384, 1024], ...), yes, and it is called "bucketing". – Clef. Jul 09 '20 at 15:21
  • Can you share some documentation around bucketing. I coudn't find any examples for Computer Vision . – Abhik Sarkar Jul 13 '20 at 07:24
  • 1
    This [link](https://developers.google.com/machine-learning/data-prep/transform/bucketing) is a basic explanation of the concept of bucketing (not applied to CV specifically). Can you detail what you don't understand about it? – Clef. Jul 14 '20 at 12:42
  • Fantastic answer, Clef! – CB Madsen Jan 08 '21 at 01:16
3

Two reasons come to mind for why the network can process different sized images:

  1. the simplest reason is that in the input to the network there is an interpolation layer that resizes the input to what the network expects.
  2. you can insert a dynamically sized inputs to the model is when you have a Fully convolutional network which means that all the operation in the network are not a depended on the input spatial size. For example if all of layers are Convolutions Relu and Pooling the network can process whatever size you will insert. So you can insert a batch with size [N, 3, 384, 320] or [N, 3, 704, 1024] and the network will run for both

The reason you think you can't run inference with different sizes is because you can't have a tensor with multiple different sizes. The tensor of images needs to be a fixed size, (N, C, H, W), you can't have image in the tensor with size (H', W') and another with (H, W) because they need to be in the same tensor with a specific size.

But you can train/infer with different size for each batch. So for example the first batch of images can be (N, C, H, W) and the next batch can be (N, C, H', W').

iacob
  • 20,084
  • 6
  • 92
  • 119
Amitay Nachmani
  • 3,259
  • 1
  • 18
  • 21
2

The reason that it may work for single images but not for batch images is that for batch images, the dataset will try and call torch.stack across your batch. This won't work because although the channel dimensions may line up (1 for grayscale or RGB for color), the Height and Width dimensions of the image will not line up correctly! This was covered above.

A way to fix this could be to find the maximum size of any image in your dataset. Then you can resize every image to be that size! The right way to do this might be to pad the image, while saving the REAL size of every image so that you can later reshape it. Here's an example:

The object returned from DataSet:

size = [1024, 1024]

return {'image': image,
        'size': size}

Where you use the data:

image = batch['image']
single_image = image[batch_index, :size[0], :size[1]] 

Now, a batch of images was returned, but you can extract them at their original sizes. You probably don't want to do this if you need to run the whole batch through the network at once, but it's worth thinking about.

relh
  • 146
  • 1
  • 6