26

I noticed in a number of places that people use something like this, usually in fully convolutional networks, autoencoders, and similar:

model.add(UpSampling2D(size=(2,2)))
model.add(Conv2DTranspose(kernel_size=k, padding='same', strides=(1,1))

I am wondering what is the difference between that and simply:

model.add(Conv2DTranspose(kernel_size=k, padding='same', strides=(2,2))

Links towards any papers that explain this difference are welcome.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Aleksandar Jovanovic
  • 1,127
  • 1
  • 11
  • 14

2 Answers2

28

Here and here you can find a really nice explanation of how transposed convolutions work. To sum up both of these approaches:

  1. In your first approach, you are first upsampling your feature map:

    [[1, 2], [3, 4]] -> [[1, 1, 2, 2], [1, 1, 2, 2], [3, 3, 4, 4], [3, 3, 4, 4]]
    

    and then you apply a classical convolution (as Conv2DTranspose with stride=1 and padding='same' is equivalent to Conv2D).

  2. In your second approach you are first un(max)pooling your feature map:

    [[1, 2], [3, 4]] -> [[1, 0, 2, 0], [0, 0, 0, 0], [3, 0, 4, 0], [0, 0, 0, 0]]
    

    and then apply a classical convolution with filter_size, filters`, etc.

    enter image description here

Fun fact is that - although these approaches are different they share something in common. Transpose convolution is meant to be the approximation of gradient of convolution, so the first approach is approximating sum pooling whereas second max pooling gradient. This makes the first results to produce slightly smoother results.

Other reasons why you might see the first approach are:

  • Conv2DTranspose (and its equivalents) are relatively new in keras so the only way to perform learnable upsampling was using Upsample2D,
  • Author of keras - Francois Chollet used this approach in one of his tutorials,
  • In the past equivalents of transpose, convolution seemed to work awful in keras due to some API inconsistencies.
Marcin Możejko
  • 39,542
  • 10
  • 109
  • 120
0

I just want to point out a couple of things that you mentioned. Upsample2D is not a learnable layer since There is literally 0 parameter.

Also, we can not justify the reason why we might want to use the first approach because Francoise Chollet introduced the usage in his example.

박찬성
  • 11
  • 1