1

At first, this question is less about programming itself but about some logic behind the CNN architecture. I do understand how every layer works but my only question is: Does is make sense to separate the ReLU and Convolution-Layer? I mean, can a ConvLayer exist and work and update its weights by using backpropagation without having a ReLU behind it?

I thought so. This is why I created the following independent layers:

  1. ConvLayer
  2. ReLU
  3. Fully Connected
  4. Pooling
  5. Transformation (transform the 3D output into one dimension) for ConvLayer -> Fully Connected.

I am thinking about merging Layer 1 and 2 into one. What should I go for?

Luecx
  • 160
  • 1
  • 12

3 Answers3

2

Can it exist?

Yes. It can. There's nothing that stops neural networks from working without non-linearity modules in the model. The thing is, skipping the non-linearity module between two adjacent layers is equivalent to just a linear combination of inputs at layer 1 to get output at layer 2

M1 : Input =====> L1 ====> ReLU ====> L2 =====> Output

M2 : Input =====> L1 ====> ......... ====> L2 =====> Output

M3 : Input =====> L1 =====> Output

M2 & M3 are equivalent since the parameters adjust themselves over the training period to generate the same output. If there is any pooling involved in between, this may not be true but as long as the layers are consecutive, the network structure is just one large linear combination (Think PCA)

There is nothing that prevents the gradient updates & back-propagation throughout the network.

What should you do?

Keep some form of non-linearity between distinct layers. You may create convolution blocks which contain more than 1 convolution layer in it, but you should include a non-linear function at the end of these blocks and definitely after the dense layers. For the dense layer not-using an activation function is completely equivalent to using a single layer.

Have a look here Quora : Role of activation functions

anakin
  • 337
  • 2
  • 7
0

The short answer is: ReLU (or other activation mechanisms) should be added to each of you convolution or fully connected layers.

CNNs and neural networks in general use activation functions like ReLU to introduce non linearity in the model. Activation functions are usually not a layer themselves, they are an additional computation to each node of a layer. You can see them as an implementation of the mechanism that decides between finding vs not finding a specific pattern. See this post.

Omar Kaab
  • 1
  • 2
  • Oh yeah... I used the sigmoid activation function by default on my fully connected neurons. Just having problems with the fully connected layer :) – Luecx Jun 19 '17 at 15:49
0

From the TensorFlow perspective, all of your computations are nodes in the graph(usually called a session). So if you want to separate layers which means adding nodes to your computation graph, go ahead but I don't see any practical reason behind it. You can backpropagate it of course, since you are just calculating the gradient of every function with derivation.

Mohammad Mirzaeyan
  • 845
  • 3
  • 11
  • 30
Sam Oz
  • 106
  • 6