1

I would like to use the function tf.nn.conv2d() on a single image example, but the TensorFlow documentation seems to only mention applying this transformation to a batch of images.

The docs mention that the input image must be of shape [batch, in_height, in_width, in_channels] and the kernel must be of shape [filter_height, filter_width, in_channels, out_channels]. However, what is the most straightforward way to achieve 2D convolution with input shape [in_height, in_width, in_channels]?

Here is an example of the current approach, where img has shape (height, width, channels):

img = tf.random_uniform((10,10,3))  # a single image
img = tf.nn.conv2d([img], kernel)[0] # creating a batch of 1, then indexing the single example

I am reshaping the input as follows:

[in_height, in_width, in_channels]->[1, in_height, in_width, in_channels]->[in_height, in_width, in_channels]

This feels like an unnecessary and costly operation when I am only interested in transforming one example.

Is there a simple/standard way to do this that doesn't involve reshaping?

Gabriel Ibagon
  • 412
  • 3
  • 9

1 Answers1

0

AFAIK there is no way around it. It seems (here and here) that the first operation creates a copy (someone correct me if I'm wrong). You may use tf.expand_dims instead though, it's IMO more readable because of it's verbosity.

On the other hand, taking 0 element from the tensor should not perform a copy in this case and is almost free.

Most importantly, except for a little inconvenience with syntax (e.g. [0]) those operations definitely are not costly, especially in the context of performing convolution.

BTW. Other ready alternative layers like the ones in tf.keras, require batch as first dimension as well.

Szymon Maszke
  • 22,747
  • 4
  • 43
  • 83