Derivative of softmax function in Python

Question

Below is the softmax activation function for a neural network. What is the derivative of this function?

def softmax(z):
   e = np.exp(z)
   return e / np.sum(e, axis=1)

Check this out https://math.stackexchange.com/questions/945871/derivative-of-softmax-loss-function — bumblebee, Mar 04 '19 at 04:19
Possible duplicate of [numpy : calculate the derivative of the softmax function](https://stackoverflow.com/questions/40575841/numpy-calculate-the-derivative-of-the-softmax-function) — desertnaut, Mar 04 '19 at 15:49

bumblebee · Answer 1 · 2019-03-04T06:00:49.557

Iterative version for softmax derivative

import numpy as np

def softmax_grad(s): 
    # Take the derivative of softmax element w.r.t the each logit which is usually Wi * X
    # input s is softmax value of the original input x. 
    # s.shape = (1, n) 
    # i.e. s = np.array([0.3, 0.7]), x = np.array([0, 1])

    # initialize the 2-D jacobian matrix.
    jacobian_m = np.diag(s)

    for i in range(len(jacobian_m)):
        for j in range(len(jacobian_m)):
            if i == j:
                jacobian_m[i][j] = s[i] * (1-s[i])
            else: 
                jacobian_m[i][j] = -s[i]*s[j]
    return jacobian_m

Vectorized version

def softmax_grad(softmax):
    # Reshape the 1-d softmax to 2-d so that np.dot will do the matrix multiplication
    s = softmax.reshape(-1,1)
    return np.diagflat(s) - np.dot(s, s.T)

Reference: https://medium.com/@aerinykim/how-to-implement-the-softmax-derivative-independently-from-any-loss-function-ae6d44363a9d

why is the input an n by 1 vector and the output an n by n matrix? — Branden Keck, Aug 04 '20 at 00:24
`if i == j: jacobian_m[i][j] = s[i] * (1-s[i])` shouldn't it be `(1 - s[j])` — mLstudent33, Jun 20 '21 at 03:27

Derivative of softmax function in Python

1 Answers1