How to initialize parameters of an activation function?

Question

I'm studying the basics of neural networks with pytorch, and I'm having a hard time understanding how the activation function should work.

I don't understand what shape should the trainable parameters of my activation function have. Should they have the same shape of the input dataset? Or should they have the shape of a single element in the dataset?

From what I understood the activation function should take as input the whole dataset, but I'm unsure of how to initialize the parameters.

class customModel(nn.Module):
    def __init__(self, units):
        super(customModel, self).__init__()
        self.p1 = nn.parameter.Parameter(torch.ones(units))
        self.p2 = nn.parameter.Parameter(torch.ones(units))
        self.b1 = nn.parameter.Parameter(torch.zeros(units))
        self.b2 = nn.parameter.Parameter(torch.zeros(units))
    def forward(self, inputs):
        out = myCustomActivationFunction(inputs, self.p1, self.p2, self.b1, self.b2)
        return out

In general, you wouldn't train an activation function. Typically they're just there to add nonlinearity between linear layers, and you train the linear layers. — Nick ODell, May 26 '23 at 18:09
I think you are confusing network layers with activation. Generally, the activation function you choose to use depends on your task. same as choosing you whole network architecture depends on your task. — IJokl, May 26 '23 at 18:09

HRNPH · Accepted Answer · 2023-05-27T01:38:39.370

In short

The activation was used to create a "Non-Linearity" between each layer which is always Linear(without activation function) and we usually choose the activation function based on our task Such as we use ReLu Between the neural network layers to create a "Non-Linearity" between each layer and we use sigmoid in the output layer to normalize value between 0-1 for the binary classification task using 0.5 as a threshold for classify between two classes

Long

To Fully Grasp how activation function was used in Neural Network First of all, we need to create a clear understanding In Neural Networks between

Neuron & layers
Trainable Parameters of The Network (Weight & Bias)
Activation Function

To Understand the Neural Network I recommend you to understand the Linear Regression model first! since it will be easier to understand about weight

y = mx+b is a Linear Function that can be leveraged to create a simple model that can predict data with Linear correlation (we call this model Linear Regression)

with "x" as the input "y" as the output and "m,b" as the features

this "m" and "b" is a trainable parameters while x is an input features

More Explanation About Linear Regression

**It's a bit hard to explain since I can't attach the image at my reputation Level So I'll attach the video link instead

Assume you are already familiar with Linear Regression

The Neural Networks are like a chain of Linear Models connected which will be called Neuron, Stacking as a Layer

Layers1 Example (First Layer Generally Called Input layer)

because it's the layer that we'll put ours features in

[x1]

[x2]

[x3]

each neuron in the layer will have a "LINE" that connected to every neuron in the next layer

each "LINE" contain its own w (weight) which is a trainable parameter

the same as "m" and "b" we can train in y = mx+b

when computing, the input was put in each X of the input layer

and then they will times the weight on the LINE connecting to each neuron of the next layer and sum them up at the destination

the formula is

Yi= sum(Xi*Wi)

To Simplify In the image below

You can think of it as computing 2 Linear Regression models Separately

then sum the output up -> use it as the input for the next Linear Regression model which predicts the gender

IT IS AT THIS PART where we'll really need the activation function Assume that the previous layers provide the information on how to predict the BIOLOGICAL Gender, Given Height & Weight

The possibility of the output layer which is just

y = mx+b formular

is -infinity & +infinity

how would you classify this output into two classes?

The Answer is by using an activation function such as Sigmoid which normalizes any range of value into between 0-1 range thinking of this as a percentage we can now use a 0.5 cutoff threshold to classify below 0.5 into class "0" and over 0.5 into class "1"

Simple Neural Network Image

Summary

As you can see, We don't train the activation function it's the trainable parameters that will be trained! The activation was used to create a "Non-Linearity" between each layer we usually choose the activation function based on our task Such as Sigmoid For Classification

To Implement Custom Activation, Just Create a Function that receives 1 input and then returns something

def custom_act(x):
  return -x

in case you need it to be trainable (which usually doesn't need to)

Referring to this Question, Already Have Good Explanation Pytorch custom activation functions?

Additional Information

What Is Neural Network

Using Sigmoid for Logistic Regression

Activation Function Explain For Binary Classification and Keras Implementation

How to initialize parameters of an activation function?

1 Answers1

In short

Long

Summary

Additional Information