4

In Keras, what are the layers(functions) corresponding to tf.nn.conv2d_transpose in Tensorflow? I once saw the comment that we can Just use combinations of UpSampling2D and Convolution2D as appropriate. Is that right?

In the following two examples, they all use this kind of combination.

1) In Building Autoencoders in Keras, author builds decoder as follows.

enter image description here

2) In an u-uet implementation, author builds deconvolution as follows

up6 = merge([UpSampling2D(size=(2, 2))(conv5), conv4], mode='concat', concat_axis=1)
conv6 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(up6)
conv6 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(conv6)
Marcin Możejko
  • 39,542
  • 10
  • 109
  • 120
user297850
  • 7,705
  • 17
  • 54
  • 76

1 Answers1

2

The corresponding layers in Keras are Deconvolution2D layers.

It's worth to mention that you should be really careful with them because they sometimes might behave in unexpected way. I strongly advise you to read this Stack Overflow question (and its answer) before you start to use this layer.

UPDATE:

  1. Deconvolution is a layer which was add relatively recently - and maybe this is the reason why people advise you to use Convolution2D * UpSampling2D.
  2. Because it's relatively new - it may not work correctly in some cases. It also need some experience to use them properly.
  3. In fact - from a mathematical point of view - every Deconvolution might be presented as a composition of Convolution2D and UpSampling2D - so maybe this is the reason why it was mentioned in texts you provided.

UPDATE 2:

Ok. I think I found an easy explaination why Deconvolution2D might be presented in a form of a composition of Convolution2D and UpSampling2D. We would use a definition that Deconvolution2D is a gradient of some convolution layer. Let's consider three most common cases:

  1. The easiest one is a Convolutional2D without any pooling. In this case - as it's the linear operation - its gradient is a function itself - so Convolution2D.
  2. The more tricky one is a gradient of Convolution2D with AveragePooling. So: (AveragePooling2D * Convolution2D)' = AveragePooling2D' * Convolution2D'. But a gradient of AveragePooling2D = UpSample2D * constant - so it's also in this case when the preposition is true.
  3. The most tricky one is one with MaxPooling2D. In this case still (MaxPooling2D * Convolution2D)' = MaxPooling2D' * Convolution2D' But MaxPooling2D' != UpSample2D. But in this case one can easily find an easy Convolution2D which makes MaxPooling2D' = Convolution2D * UpSample2D (intuitively - a gradient of MaxPooling2D is a zero matrix with only one 1 on its diagonal. As Convolution2D might express a matrix operation - it may also represent the injection from a identity matrix to a MaxPooling2D gradient). So: (MaxPooling2D * Convolution2D)' = UpSampling2D * Convolution2D * Convolution2D = UpSampling2D * Convolution2D'.

The final remark is that all parts of the proof have shown that Deconvolution2D is a composition of UpSampling2D and Convolution2D instead of opposite. One can easily proof that every function of a form a composition of UpSampling2D and Convolution2D might be easily presented in a form of a composition of UpSampling2D and Convolution2D. So basically - the proof is done :)

Community
  • 1
  • 1
Marcin Możejko
  • 39,542
  • 10
  • 109
  • 120
  • Hi Marcin, thanks for the reply. What confuses me is that in some other cases, there are implementations based on the combination of "upsampling" and "convolution2D) aiming to achieve the similar goal. What are the differences between Deconvolution2D and this combination? I have updated my original post including the mentioned examples. – user297850 Feb 09 '17 at 19:33
  • Thanks, so can I safely assume that they are mathematically equivalent? Are there any mathmatical derivations for this? Thanks. – user297850 Feb 09 '17 at 19:45
  • It's rather tedious. I proved it for my self while writing the answer but I couldn't find a written version of this proof. If you really want to - I could write this in a conversation or look longer for a version of this proof in an article. – Marcin Możejko Feb 09 '17 at 20:07
  • Hi Marcin, thanks a lot for your very detailed answer. – user297850 Feb 09 '17 at 23:25