Use this tag for programming-related questions about the softmax function, also known as the normalized exponential function. Questions specific to a certain programming language should also be tagged with that language.
Questions tagged [softmax]
534 questions
303
votes
26 answers
How to implement the Softmax function in Python
From the Udacity's deep learning class, the softmax of y_i is simply the exponential divided by the sum of exponential of the whole Y vector:
Where S(y_i) is the softmax function of y_i and e is the exponential and j is the no. of columns in the…

alvas
- 115,346
- 109
- 446
- 738
227
votes
10 answers
Why use softmax as opposed to standard normalization?
In the output layer of a neural network, it is typical to use the softmax function to approximate a probability distribution:
This is expensive to compute because of the exponents. Why not simply perform a Z transform so that all outputs are…

Tom
- 6,601
- 12
- 40
- 48
126
votes
3 answers
What's the difference between sparse_softmax_cross_entropy_with_logits and softmax_cross_entropy_with_logits?
I recently came across tf.nn.sparse_softmax_cross_entropy_with_logits and I can not figure out what the difference is compared to tf.nn.softmax_cross_entropy_with_logits.
Is the only difference that training vectors y have to be one-hot encoded when…

daniel451
- 10,626
- 19
- 67
- 125
47
votes
1 answer
Why should we use Temperature in softmax?
I'm recently working on CNN and I want to know what is the function of temperature in softmax formula? and why should we use high temperatures to see a softer norm in probability distribution?Softmax Formula

Sara
- 592
- 1
- 4
- 13
40
votes
1 answer
Should I use softmax as output when using cross entropy loss in pytorch?
I have a problem with classifying fully connected deep neural net with 2 hidden layers for MNIST dataset in pytorch.
I want to use tanh as activations in both hidden layers, but in the end, I should use softmax.
For the loss, I am choosing…

pikachu
- 690
- 1
- 6
- 17
40
votes
4 answers
RuntimeWarning: invalid value encountered in greater
I tried to implement soft-max with the following code (out_vec is a numpy vector of floats):
numerator = np.exp(out_vec)
denominator = np.sum(np.exp(out_vec))
out_vec = numerator/denominator
However, I got an overflow error because of…

Cheshie
- 2,777
- 6
- 32
- 51
39
votes
4 answers
Numerically stable softmax
Is there a numerically stable way to compute softmax function below?
I am getting values that becomes Nans in Neural network code.
np.exp(x)/np.sum(np.exp(y))

Abhishek Bhatia
- 9,404
- 26
- 87
- 142
32
votes
4 answers
CS231n: How to calculate gradient for Softmax loss function?
I am watching some videos for Stanford CS231: Convolutional Neural Networks for Visual Recognition but do not quite understand how to calculate analytical gradient for softmax loss function using numpy.
From this stackexchange answer, softmax…

Nghia Tran
- 809
- 1
- 7
- 14
26
votes
4 answers
Difference between logistic regression and softmax regression
I know that logistic regression is for binary classification and softmax regression for multi-class problem. Would it be any differences if I train several logistic regression models with the same data and normalize their results to get a…

Xuan Wang
- 263
- 1
- 3
- 5
22
votes
1 answer
Scalable, Efficient Hierarchical Softmax in Tensorflow?
I'm interested in implementing a hierarchical softmax model that can handle large vocabularies, say on the order of 10M classes. What is the best way to do this to both be scalable to large class counts and efficient? For instance, at least one…

Wesley Tansey
- 4,555
- 10
- 42
- 69
21
votes
2 answers
Implementation of a softmax activation function for neural networks
I am using a Softmax activation function in the last layer of a neural network. But I have problems with a safe implementation of this function.
A naive implementation would be this one:
Vector y = mlp(x); // output of the neural network without…

alfa
- 3,058
- 3
- 25
- 36
21
votes
5 answers
Why use softmax only in the output layer and not in hidden layers?
Most examples of neural networks for classification tasks I've seen use the a softmax layer as output activation function. Normally, the other hidden units use a sigmoid, tanh, or ReLu function as activation function. Using the softmax function here…

beyeran
- 885
- 1
- 8
- 26
20
votes
3 answers
numpy : calculate the derivative of the softmax function
I am trying to understand backpropagation in a simple 3 layered neural network with MNIST.
There is the input layer with weights and a bias. The labels are MNIST so it's a 10 class vector.
The second layer is a linear tranform. The third layer is…

Sam Hammamy
- 10,819
- 10
- 56
- 94
18
votes
1 answer
Binary classification with Softmax
I am training a binary classifier using Sigmoid activation function with Binary crossentropy which gives good accuracy around 98%.
The same when I train using softmax with categorical_crossentropy gives very low accuracy (< 40%).
I am passing the…

AKSHAYAA VAIDYANATHAN
- 2,715
- 7
- 30
- 51
16
votes
2 answers
Tensorflow: Hierarchical Softmax Implementation
I'm currently having text inputs represented by vector, and I want to classify their categories. Because they are multi-level categories, I meant to use Hierarchical Softmax.
Example:
- Computer Science
- Machine Learning
- NLP
-…

Viet Phan
- 1,999
- 3
- 23
- 40