The NaN
error probably occurs when one of the softmaxed logits gets truncated to 0, as you have said, and then it performs log(0) to compute the cross-entropy error.
To avoid this, as it is suggested in this other answer, you could clip the values of the softmax output so that they are never zero.
out = tf.clip_by_value(out,1e-10,100.0)
Or you could add a small constant to avoid having zeros:
out = out + 1e-10
The problem with it is that the softmax function is applied on the logits internally by sparse_softmax_cross_entropy_with_logits()
so you can not change its behavior.
To overcome this, code the cross entropy error yourself and add the constant 1e-10
to the output of the softmax, not to the logits.
loss = -tf.reduce_sum(labels*tf.log(tf.nn.softmax(logits) + 1e-10))
Be aware that with the sparse_softmax_cross_entropy_with_logits()
function the variable labels
was the numeric value of the label, but if you implement the cross-entropy loss yourself, labels
have to be the one-hot encoding of these numeric labels.
Update: I have corrected the answer thanks to the comment by @mdaoust. As he said the zeros are only relevant after the softmax function has been applied to the logits, not before.