While calculating softmax, exp(x) may become inf causing suftmax returning NaN. I saw a way to prevent.
# --------------------------------------------------------------------------------
# exp(x-c) to prevent the infinite exp(x) for a large value x, with c = max(x).
# keepdims=True to be able to broadcast.
# --------------------------------------------------------------------------------
C = np.max(X, axis=-1, keepdims=True)
exp = np.exp(X - C)
P = np.divide(exp, np.sum(exp, axis=-1, keepdims=True), out=out)
Do tf.nn.softmax and tf.keras.activations.softmax take similar/same measures internally or do I need to make sure x
will not cause it?