In short
The activation was used to create a "Non-Linearity" between each layer
which is always Linear(without activation function)
and we usually choose the activation function based on our task
Such as we use ReLu Between the neural network layers to create a "Non-Linearity" between each layer
and we use sigmoid in the output layer to normalize value between 0-1
for the binary classification task using 0.5 as a threshold for classify between two classes
Long
To Fully Grasp how activation function was used in Neural Network
First of all, we need to create a clear understanding
In Neural Networks between
- Neuron & layers
- Trainable Parameters of The Network (Weight & Bias)
- Activation Function
To Understand the Neural Network I recommend you to understand the Linear Regression model first! since it will be easier to understand about weight
y = mx+b is a Linear Function that can be leveraged to create a simple model
that can predict data with Linear correlation (we call this model Linear Regression)
with "x" as the input "y" as the output and "m,b" as the features
this "m" and "b" is a trainable parameters while x is an input features
More Explanation About Linear Regression
**It's a bit hard to explain since I can't attach the image at my reputation Level So I'll attach the video link instead
Assume you are already familiar with Linear Regression
The Neural Networks are like a chain of Linear Models connected
which will be called Neuron, Stacking as a Layer
Layers1 Example (First Layer Generally Called Input layer)
because it's the layer that we'll put ours features in
[x1]
[x2]
[x3]
each neuron in the layer will have a "LINE" that connected to every neuron in the next layer
each "LINE" contain its own w (weight) which is a trainable parameter
the same as "m" and "b" we can train in y = mx+b
when computing, the input was put in each X of the input layer
and then they will times the weight on the LINE connecting to each neuron of the next layer and sum them up at the destination
the formula is
Yi= sum(Xi*Wi)
To Simplify
In the image below
You can think of it as computing 2 Linear Regression models Separately
then sum the output up -> use it as the input for the next Linear Regression model which predicts the gender
IT IS AT THIS PART where we'll really need the activation function
Assume that the previous layers provide the information on how to predict the
BIOLOGICAL Gender, Given Height & Weight
The possibility of the output layer which is just
y = mx+b formular
is -infinity & +infinity
how would you classify this output into two classes?
The Answer is by using an activation function such as Sigmoid
which normalizes any range of value into between 0-1 range
thinking of this as a percentage we can now use a 0.5 cutoff threshold
to classify below 0.5 into class "0" and over 0.5 into class "1"
Simple Neural Network Image
Summary
As you can see, We don't train the activation function
it's the trainable parameters that will be trained!
The activation was used to create a "Non-Linearity" between each layer
we usually choose the activation function based on our task
Such as Sigmoid For Classification
To Implement Custom Activation, Just Create a Function that receives 1 input and then returns something
def custom_act(x):
return -x
in case you need it to be trainable (which usually doesn't need to)
Referring to this Question, Already Have Good Explanation
Pytorch custom activation functions?
Additional Information
What Is Neural Network
Using Sigmoid for Logistic Regression
Activation Function Explain For Binary Classification and Keras Implementation