8

I am getting NaN when I attempt to use the sparse_softmax_cross_entropy_with_logits loss function in tensorflow. I have a simple network, something like:

layer = tf.nn.relu(tf.matmul(inputs, W1) + b1)
layer = tf.nn.relu(tf.matmul(layer, W2) + b2)
logits = tf.matmul(inputs, W3) + b3
loss = tf.sparse_softmax_cross_entropy_with_logits(logits, labels)

I have many classes (~10000), so I imagine I am getting NaN because the logit corresponding to correct class in at least one of my examples got truncated to zero. Is there a way to avoid this?

Davis Yoshida
  • 1,757
  • 1
  • 10
  • 24

3 Answers3

12

It actually turns out that some of my labels were out of range (e.g. a label of 14000, when my logits matrix is just 150 x 10000). It turns out this results in a NaN rather than an error.

Davis Yoshida
  • 1,757
  • 1
  • 10
  • 24
  • Can you explain what you meant by "labels out of range"? I think for each sample, the labels are a vector length matching the logit. I tried `a = tf.constant(np.array([[200.1, 20000.3, .5, .9], [1.0, 10000.0, 10.0, 10.0]])) l = tf.constant(np.array([[1, 1, 1, 1, 1], [1, 0, 0]])) s.run(tf.nn.softmax_cross_entropy_with_logits(logits=a, labels=l))`. when the dimension not matching, it would complain about dimension; and if sum of probability > 1, it causes no error or `NaN`. what do you mean by "a label of 14000"? – teddy Aug 06 '17 at 04:10
  • The difference is that I was using `tf.sparse_softmax_cross_entropy_with_logits` so the inputs are the index of the label. When I say out of range, I mean I supplied (e.g.) the index 23, while only providing 7 logits to the function for each example. – Davis Yoshida Aug 07 '17 at 21:40
5

tf.sparse_softmax_cross_entropy_with_logits handles the case of log(0) for you, you don't have to worry about it.

Usually a NaN is due to a high learning rate of your optimization algorithm. Try to lower it until NaN errors disappear and the loss starts to decrease

nessuno
  • 26,493
  • 5
  • 83
  • 74
1

The NaN error probably occurs when one of the softmaxed logits gets truncated to 0, as you have said, and then it performs log(0) to compute the cross-entropy error.

To avoid this, as it is suggested in this other answer, you could clip the values of the softmax output so that they are never zero.

out = tf.clip_by_value(out,1e-10,100.0)

Or you could add a small constant to avoid having zeros:

out = out + 1e-10

The problem with it is that the softmax function is applied on the logits internally by sparse_softmax_cross_entropy_with_logits() so you can not change its behavior.

To overcome this, code the cross entropy error yourself and add the constant 1e-10 to the output of the softmax, not to the logits.

loss = -tf.reduce_sum(labels*tf.log(tf.nn.softmax(logits) + 1e-10))

Be aware that with the sparse_softmax_cross_entropy_with_logits() function the variable labels was the numeric value of the label, but if you implement the cross-entropy loss yourself, labels have to be the one-hot encoding of these numeric labels.

Update: I have corrected the answer thanks to the comment by @mdaoust. As he said the zeros are only relevant after the softmax function has been applied to the logits, not before.

Community
  • 1
  • 1
Guillem Cucurull
  • 1,681
  • 1
  • 22
  • 30
  • 1
    A logit of zero is nothing special. logits can be negative. Clipping to [-100,100] would be more reasonable, but may not solve the problem. – mdaoust Sep 20 '16 at 10:04
  • 1
    You're right, it only matters if the softmax output is zero, not if the logit is zero. Thanks! – Guillem Cucurull Sep 20 '16 at 10:16