3

Below is a small code I am trying to calculate the softmax. It works well for a single array. But with a larger number like 1000 etc, it blows up

import numpy as np

def softmax(x):
 print (x.shape)
 softmax1 = np.exp(x)/np.sum(np.exp(x))
 return softmax1


def test_softmax():
  print "Running your code"
  #print softmax(np.array([1,2]))
  test1 = softmax(np.array([1,2]))
  ans1 = np.array([0.26894142,  0.73105858])
  assert np.allclose(test1, ans1, rtol=1e-05, atol=1e-06)
  print ("Softmax values %s" % test1)

  test2 = softmax(np.array([[1001,1002],[3,4]]))
  print test2
  ans2 = np.array([
      [0.26894142, 0.73105858],
      [0.26894142, 0.73105858]])
  assert np.allclose(test2, ans2, rtol=1e-05, atol=1e-06)

if __name__ == "__main__":
 test_softmax()

I get an error RuntimeWarning: overflow encountered in exp Running your code softmax1 = np.exp(x)/np.sum(np.exp(x))

Shivaji Dutta
  • 31
  • 1
  • 3
  • You might be interested in http://stackoverflow.com/questions/34968722/softmax-function-python and http://stackoverflow.com/questions/42599498/numercially-stable-softmax – Warren Weckesser Apr 13 '17 at 22:44

1 Answers1

7

Typical implementations of softmax take away the maximum value first to solve this problem:

def softmax(x, axis=-1):
    # save typing...
    kw = dict(axis=axis, keepdims=True)

    # make every value 0 or below, as exp(0) won't overflow
    xrel = x - x.max(**kw)

    # if you wanted better handling of small exponents, you could do something like this
    # to try and make the values as large as possible without overflowing, The 0.9
    # is a fudge factor to try and ignore rounding errors
    #
    #     xrel += np.log(np.finfo(float).max / x.shape[axis]) * 0.9

    exp_xrel = np.exp(xrel)
    return exp_xrel / exp_xrel.sum(**kw)  

Algebraically, this is exactly the same, but this ensures that the largest value ever passed into exp is 1.

Eric
  • 95,302
  • 53
  • 242
  • 374
  • Thanks for the response. I see the values are coming to as expected, except now in the test case -Where `code` np.array([[1001,1002],[3,4]]). Where the output looks like [[ 0.26894142 0.73105858] [ 0. 0. ]] instead of [ [0.26894142, 0.73105858], [0.26894142, 0.73105858]]. – Shivaji Dutta Apr 13 '17 at 21:36
  • Ah, hadn't realized you wanted column-wise softmax instead of array-wise. Updated – Eric Apr 14 '17 at 15:52
  • great. It helped a lot – Erfan Salavati Nov 16 '22 at 19:16