# How to implement the Softmax derivative independently from any loss function?

Mathematically, the derivative of Softmax σ(j) with respect to the logit Zi (for example, Wi*X) is

where the red delta is a Kronecker delta.

If you implement iteratively:

`import numpy as npdef softmax_grad(s):     # Take the derivative of softmax element w.r.t the each logit which is usually Wi * X    # input s is softmax value of the original input x.     # s.shape = (1, n)     # i.e. s = np.array([0.3, 0.7]), x = np.array([0, 1])    # initialize the 2-D jacobian matrix.    jacobian_m = np.diag(s)    for i in range(len(jacobian_m)):        for j in range(len(jacobian_m)):            if i == j:                jacobian_m[i][j] = s[i] * (1-s[i])            else:                 jacobian_m[i][j] = -s[i]*s[j]    return jacobian_m`

Let’s test.

`In [95]:  x = np.array([1, 2])def softmax(z):    z -= np.max(z)    sm = (np.exp(z).T / np.sum(np.exp(z), axis=0)).T    return smIn [96]: softmax(x)Out[96]: array([ 0.26894142,  0.73105858])In [97]: softmax_grad(softmax(x))Out[97]: array([[ 0.19661193, -0.19661193],       [-0.19661193,  0.19661193]])`

If you implement it in a vectorized version:

`soft_max = softmax(x)    def softmax_grad(softmax):    # Reshape the 1-d softmax to 2-d so that np.dot will do the matrix multiplication    s = softmax.reshape(-1,1)    return np.diagflat(s) - np.dot(s, s.T)In [18]: softmax_grad(soft_max)Out[18]:array([[ 0.19661193, -0.19661193],       [-0.19661193,  0.19661193]])`

I’m an Engineering Manager at Scale AI and this is my notepad for Applied Math / CS / Deep Learning topics. Follow me on Twitter for more!

## More from Aerin Kim

I’m an Engineering Manager at Scale AI and this is my notepad for Applied Math / CS / Deep Learning topics. Follow me on Twitter for more!

## How to interpret p-value with COVID-19 data

Get the Medium app