I would like to know what is the interest to add bias b to Softmax function in the case of CNNs
Asked
Active
Viewed 1,103 times
-1

David Parks
- 30,789
- 47
- 185
- 328

JaG
- 3
- 3
-
1Your question needs some refinement. The equation you're showing ( you should type it, not link to images) is showing a fully connected network operation, not a CNN. And neither of these seem to be directly related to softmax in the context of your questoin. Could you update your question with more detail please? – David Parks Aug 30 '18 at 19:50
1 Answers
0
The formula you linked is a standard affine transformation preceding the application of a pointwise nonlinearity, not the softmax activation function itself. If you'd like to know why a bias term is used in neural networks, please refer to this post: Role of Bias in Neural Networks

Pranav Vempati
- 558
- 3
- 5
- 16
-
Is the affine transformation just applied before the soft-max function? What is the goal to use affine transformation before a softmax function? – JaG Aug 31 '18 at 13:57
-
The softmax function deterministically maps unscaled logits(the output of the affine transformation) to normalized probability distributions. Thus, the predictions emitted by a softmax activation function can be interpreted as class probabilities. – Pranav Vempati Aug 31 '18 at 15:40
-
1