1

While calculating softmax, exp(x) may become inf causing suftmax returning NaN. I saw a way to prevent.

# --------------------------------------------------------------------------------
# exp(x-c) to prevent the infinite exp(x) for a large value x, with c = max(x).
# keepdims=True to be able to broadcast.
# --------------------------------------------------------------------------------
C = np.max(X, axis=-1, keepdims=True)
exp = np.exp(X - C)
P = np.divide(exp, np.sum(exp, axis=-1, keepdims=True), out=out)

Do tf.nn.softmax and tf.keras.activations.softmax take similar/same measures internally or do I need to make sure x will not cause it?

mon
  • 18,789
  • 22
  • 112
  • 205

1 Answers1

2

AFAIK, They do. For keras activation, check the source code keras.activations.softmax:

...
e = math_ops.exp(x - math_ops.reduce_max(x, axis=axis, keepdims=True))
s = math_ops.reduce_sum(e, axis=axis, keepdims=True)
output = e / s
...

And I believe tf.nn.softmax also do the same. Though I can't confirm that because of from tensorflow.python.ops import gen_nn_ops import, reasons. If you follow the source code, here, it passes gen_nn_ops.softmax operation to def _wrap_2d_function.

Innat
  • 16,113
  • 6
  • 53
  • 101