This is a branching from another quesion/answer
I want a function equivalent to this:
def softmax(x, tau):
""" Returns softmax probabilities with temperature tau
Input: x -- 1-dimensional array
Output: s -- 1-dimensional array
"""
e_x = np.exp(x / tau)
return e_x / e_x.sum()
which is stable and robust, i.e. it doesn't overflow for small values of tau
, nor for large x
. Since this will be used to compute probabilities, the output should sum to 1.
In other words, I am passing in some values (and a temperature) and I want as output an array of probabilities "scaled" with the input and tau.
Examples:
In [3]: softmax(np.array([2,1,1,3]), 1)
Out[3]: array([ 0.22451524, 0.08259454, 0.08259454, 0.61029569])
In [5]: softmax(np.array([2,1,1,3]), 0.1)
Out[5]: array([ 4.53978685e-05, 2.06106004e-09, 2.06106004e-09, 99954598e-01])
In [7]: softmax(np.array([2,1,1,3]), 5)
Out[7]: array([ 0.25914361, 0.21216884, 0.21216884, 0.31651871])
So as tau goes towards 0, the highest probability in the output is on the position of the highest element. As tau grows larger, the probabilites become closer to one another.
Optionally, questions about the linked answer. There, Neil gives the following alternative:
def nat_to_exp(q):
max_q = max(0.0, np.max(q))
rebased_q = q - max_q
return np.exp(rebased_q - np.logaddexp(-max_q, np.logaddexp.reduce(rebased_q)))
However, this output does not sum to 1 and the explanation is that the function returns a categorial distribution which only has N-1
free parameters, the last one being 1 - sum(others)
. But upon running, I notice that for a vector of length 3 it returns a vector of length 3. So where is the missing one ? Can I make it equivalent to the above example ?
Why that answer is stable? How does one get from the simple formula of softmax
to this ?
Possibly related question: General softmax but without temperature