3

Given 2 3D tensors t1 = [?, 1, 1, 1, 2048] and t2 = [?, 3, 1, 1, 256] seen in the image, how would these be concatenated? Currently, I am using:

tf.concat([t1, t2], 4)

However, given that my architecture has a large amount of layers with many concatenations, I eventually have a tensor that is too large (in terms of channels/features) to initialize. Is this the correct way to implement a concatenation layer?

enter image description here

Maxim
  • 52,561
  • 27
  • 155
  • 209
Devin Haslam
  • 747
  • 2
  • 12
  • 34

1 Answers1

4

First of all, the shapes of tensors in the inception layer are not like you define. 1x1, 1x3 and 3x1 are the shapes of the filters applied to the image. There are two more parameters in convolution: padding and striding, and depending on their exact values, the result shape can be very different.

In this particular case, the spatial shape doesn't change, only the channels dimension will be 2048 and 256, that's why they can be concatenated. The concatenation of your original t1 and t2 will result in error.

Is this the correct way to implement a concatenation layer?

Yes, feature map concatenation is one of key ideas of inception network and its implementation indeed uses tf.concat (e.g. see inception v1 source code).

Note that this tensor will grow in one direction (channels / features), but contract in spatial dimensions because of downsampling, so it won't get too large. Also note that this tensor is the transformed input data (image), hence unlike the weights, it's not initialized, but rather flows through the network. The weights will be the tensors 1x1x2048=2048, 1x3x224=672, 3x1x256=768, etc - as you can see they are not very big at all, and that's another idea of the inception network.

Maxim
  • 52,561
  • 27
  • 155
  • 209
  • Thank you for your help. The orange arrow in the image represents a shortcut for residual learning. I have recently been told that this will not be a simple concatenation, "but a sum of the skipped and not skipped channels." If you are familiar with this topic would you mind letting me know how I would implement this in tensorflow? – Devin Haslam Oct 30 '17 at 17:20
  • @DevinHaslam in short, it's a simple `plus` operation, exactly as it is pictured. But you may also need to pad dimensions with zeros to match the shapes of both tensors (the number of channels changes). I'll be happy to answer in detail if you ask a particular question. – Maxim Oct 30 '17 at 17:51
  • https://stackoverflow.com/questions/47021810/residual-learning-in-tensorflow Thank you @Maxim – Devin Haslam Oct 30 '17 at 18:20