I'm trying to understand the transformation performed by tf.layers.conv2d
.
The mnist tutorial code from the TensorFlow website includes the convolution layer:
# Computes 64 features using a 5x5 filter.
# Padding is added to preserve width and height.
# Input Tensor Shape: [batch_size, 14, 14, 32]
# Output Tensor Shape: [batch_size, 14, 14, 64]
conv2 = tf.layers.conv2d(
inputs=pool1,
filters=64,
kernel_size=[5, 5],
padding="same",
activation=tf.nn.relu)
However, my expectation is that the 32 input images would be multiplied by the number of filters, as each filter is applied to each image, to give an output tensor of [batch_sz, 14, 14, 2048]
. Clearly this is wrong, but I don't know why. How does the transformation work? The API documentation tells me nothing about how it works. What would be the output if the input tensor was [batch_size, 14, 14, 48]
?