Questions tagged [softmax]

Use this tag for programming-related questions about the softmax function, also known as the normalized exponential function. Questions specific to a certain programming language should also be tagged with that language.

534 questions
303
votes
26 answers

How to implement the Softmax function in Python

From the Udacity's deep learning class, the softmax of y_i is simply the exponential divided by the sum of exponential of the whole Y vector: Where S(y_i) is the softmax function of y_i and e is the exponential and j is the no. of columns in the…
alvas
  • 115,346
  • 109
  • 446
  • 738
227
votes
10 answers

Why use softmax as opposed to standard normalization?

In the output layer of a neural network, it is typical to use the softmax function to approximate a probability distribution: This is expensive to compute because of the exponents. Why not simply perform a Z transform so that all outputs are…
Tom
  • 6,601
  • 12
  • 40
  • 48
126
votes
3 answers

What's the difference between sparse_softmax_cross_entropy_with_logits and softmax_cross_entropy_with_logits?

I recently came across tf.nn.sparse_softmax_cross_entropy_with_logits and I can not figure out what the difference is compared to tf.nn.softmax_cross_entropy_with_logits. Is the only difference that training vectors y have to be one-hot encoded when…
daniel451
  • 10,626
  • 19
  • 67
  • 125
47
votes
1 answer

Why should we use Temperature in softmax?

I'm recently working on CNN and I want to know what is the function of temperature in softmax formula? and why should we use high temperatures to see a softer norm in probability distribution?Softmax Formula
Sara
  • 592
  • 1
  • 4
  • 13
40
votes
1 answer

Should I use softmax as output when using cross entropy loss in pytorch?

I have a problem with classifying fully connected deep neural net with 2 hidden layers for MNIST dataset in pytorch. I want to use tanh as activations in both hidden layers, but in the end, I should use softmax. For the loss, I am choosing…
pikachu
  • 690
  • 1
  • 6
  • 17
40
votes
4 answers

RuntimeWarning: invalid value encountered in greater

I tried to implement soft-max with the following code (out_vec is a numpy vector of floats): numerator = np.exp(out_vec) denominator = np.sum(np.exp(out_vec)) out_vec = numerator/denominator However, I got an overflow error because of…
Cheshie
  • 2,777
  • 6
  • 32
  • 51
39
votes
4 answers

Numerically stable softmax

Is there a numerically stable way to compute softmax function below? I am getting values that becomes Nans in Neural network code. np.exp(x)/np.sum(np.exp(y))
Abhishek Bhatia
  • 9,404
  • 26
  • 87
  • 142
32
votes
4 answers

CS231n: How to calculate gradient for Softmax loss function?

I am watching some videos for Stanford CS231: Convolutional Neural Networks for Visual Recognition but do not quite understand how to calculate analytical gradient for softmax loss function using numpy. From this stackexchange answer, softmax…
Nghia Tran
  • 809
  • 1
  • 7
  • 14
26
votes
4 answers

Difference between logistic regression and softmax regression

I know that logistic regression is for binary classification and softmax regression for multi-class problem. Would it be any differences if I train several logistic regression models with the same data and normalize their results to get a…
22
votes
1 answer

Scalable, Efficient Hierarchical Softmax in Tensorflow?

I'm interested in implementing a hierarchical softmax model that can handle large vocabularies, say on the order of 10M classes. What is the best way to do this to both be scalable to large class counts and efficient? For instance, at least one…
Wesley Tansey
  • 4,555
  • 10
  • 42
  • 69
21
votes
2 answers

Implementation of a softmax activation function for neural networks

I am using a Softmax activation function in the last layer of a neural network. But I have problems with a safe implementation of this function. A naive implementation would be this one: Vector y = mlp(x); // output of the neural network without…
alfa
  • 3,058
  • 3
  • 25
  • 36
21
votes
5 answers

Why use softmax only in the output layer and not in hidden layers?

Most examples of neural networks for classification tasks I've seen use the a softmax layer as output activation function. Normally, the other hidden units use a sigmoid, tanh, or ReLu function as activation function. Using the softmax function here…
20
votes
3 answers

numpy : calculate the derivative of the softmax function

I am trying to understand backpropagation in a simple 3 layered neural network with MNIST. There is the input layer with weights and a bias. The labels are MNIST so it's a 10 class vector. The second layer is a linear tranform. The third layer is…
Sam Hammamy
  • 10,819
  • 10
  • 56
  • 94
18
votes
1 answer

Binary classification with Softmax

I am training a binary classifier using Sigmoid activation function with Binary crossentropy which gives good accuracy around 98%. The same when I train using softmax with categorical_crossentropy gives very low accuracy (< 40%). I am passing the…
AKSHAYAA VAIDYANATHAN
  • 2,715
  • 7
  • 30
  • 51
16
votes
2 answers

Tensorflow: Hierarchical Softmax Implementation

I'm currently having text inputs represented by vector, and I want to classify their categories. Because they are multi-level categories, I meant to use Hierarchical Softmax. Example: - Computer Science - Machine Learning - NLP -…
Viet Phan
  • 1,999
  • 3
  • 23
  • 40
1
2 3
35 36