41

The theory from these links show that the order of Convolutional Network is: Convolutional Layer - Non-linear Activation - Pooling Layer.

  1. Neural networks and deep learning (equation (125)
  2. Deep learning book (page 304, 1st paragraph)
  3. Lenet (the equation)
  4. The source in this headline

But, in the last implementation from those sites, it said that the order is: Convolutional Layer - Pooling Layer - Non-linear Activation

  1. network3.py
  2. The sourcecode, LeNetConvPoolLayer class

I've tried too to explore a Conv2D operation syntax, but there is no activation function, it's only convolution with flipped kernel. Can someone help me to explain why is this happen?

malioboro
  • 3,097
  • 4
  • 35
  • 55

3 Answers3

64

Well, max-pooling and monotonely increasing non-linearities commute. This means that MaxPool(Relu(x)) = Relu(MaxPool(x)) for any input. So the result is the same in that case. So it is technically better to first subsample through max-pooling and then apply the non-linearity (if it is costly, such as the sigmoid). In practice it is often done the other way round - it doesn't seem to change much in performance.

As for conv2D, it does not flip the kernel. It implements exactly the definition of convolution. This is a linear operation, so you have to add the non-linearity yourself in the next step, e.g. theano.tensor.nnet.relu.

eickenberg
  • 14,152
  • 1
  • 48
  • 52
  • ah that's right, the result is the same (after today experiment), and as a guess, may be it's implemented like that because the cost. Thanks :) – malioboro Feb 22 '16 at 15:10
  • convolution is not a linear operation, thats why if you remove all of your non-linearities such as Relu, sigmoid, etc you will still have a working network. convolution operation is implemented as a correlation operation for performance agendas, and in neural network, since the filters are learned automatically, the end effect is the same as of the convolution filter. apart from that in Bp, the convolution nature is taken into account. therefor it really is a convolution operation, taking place and thus a non linear one. – Hossein Sep 26 '16 at 06:28
  • 10
    convolution *is* a linear operation, as is cross-correlation. Linear both in the data and in the filters. – eickenberg Sep 27 '16 at 08:01
  • what in case if you want to add CONV->relu->conv->relu-POOL. I have seen some structure in this way. – Feras Oct 05 '16 at 14:22
  • 1
    What about average pooling? In https://github.com/adobe/antialiased-cnns they use Conv->Relu->BlurPool – mrgloom Mar 13 '20 at 17:20
24

In many papers people use conv -> pooling -> non-linearity. It does not mean that you can't use another order and get reasonable results. In case of max-pooling layer and ReLU the order does not matter (both calculate the same thing):

enter image description here

You can proof that this is the case by remembering that ReLU is an element-wise operation and a non-decreasing function so

enter image description here

The same thing happens for almost every activation function (most of them are non-decreasing). But does not work for a general pooling layer (average-pooling).


Nonetheless both orders produce the same result, Activation(MaxPool(x)) does it significantly faster by doing less amount of operations. For a pooling layer of size k, it uses k^2 times less calls to activation function.

Sadly this optimization is negligible for CNN, because majority of the time is used in convolutional layers.

Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
0

Max pooling is a sample-based discretization process. The objective is to down-sample an input representation (image, hidden-layer output matrix, etc.), reducing its dimensionality and allowing for assumptions to be made about features contained in the sub-regions binned