I haven't seen a library implementation of softmax, although that's not proof that it doesn't exist. It's simple enough that people just write their own when they need it.
For the record, the softmax function on u1
, u2
, u3
... is just the tuple (exp(u1)/Z, exp(u2)/Z, exp(u3)/Z, ...)
where the normalizing constant Z
is just the sum of the exponentials, Z = exp(u1) + exp(u2) + exp(u3) + ...
.
Note that adding or subtracting a constant from each u
leaves the result unchanged, since it's equivalent to multiplying above and below by the same factor. So you could make the calculation a little more numerically well-behaved by subtracting the greatest value among the u
's; then the largest term exp(u)
will be 1 and all the others something smaller than that.