2

I'm learning tensorflow and want to relate tensorflow implementation with Mathematics.

From my knowledge, mathematical cross entropy requires the sum of its input to be 1. In the following code, y_true is a valid input while y_pred is not a Mathematically valid input:

y_true = [[0, 1]]
y_pred = [[1.0, 20.0]]
print(tf.keras.losses.CategoricalCrossentropy(from_logits=False).call(y_true, y_pred))
print(tf.keras.losses.CategoricalCrossentropy(from_logits=True).call(y_true, y_pred))

Gives:

tf.Tensor([0.04879016], shape=(1,), dtype=float32)
tf.Tensor([0.], shape=(1,), dtype=float32)

Please find the gist here.

This answer says:

if from_logits=False, means the input is a probability

This answer says:

from_logits=True means the input to crossEntropy layer is normal tensor/logits

This answer says:

"Another name for raw_predictions in the above code is logit

from_logits, I guess, means the input is raw_predictions.

Since my input are not probability, I set from_logits=True, but the result I get is 0.

Can anyone explain?

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Gqqnbig
  • 5,845
  • 10
  • 45
  • 86

1 Answers1

2

The cross entropy between labels [[0, 1]] and logits [[1, 20]] should be a value very close to 0 (and some outputs might represent it as zero due to floating point imprecision). Represented as probabilities, these logits would be approximately [[0.000000005, 1]]. Notice how close these probabilities are to the labels. The cross entropy should therefore be very low.

As OP points out in their question, from_logits=True should be used when operating on unscaled outputs. Practically speaking, from_logits=True is used if operating on outputs before softmax. Softmax maps unscaled outputs to probabilities. To compute cross entropy of those probabilities, from_logits=False should be used.

Here is an example:

import tensorflow as tf

y_true = tf.convert_to_tensor([[0, 1]], "float32")
y_pred = tf.convert_to_tensor([[1, 20]], "float32")

ce_logits_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
ce_probs_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=False)

print(ce_logits_fn(y_true, y_pred))
# tf.Tensor(0.0, shape=(), dtype=float32)

print(ce_probs_fn(y_true, tf.nn.softmax(y_pred)))
# tf.Tensor(1.1920929e-07, shape=(), dtype=float32)

Try with predictions closer together. In the example above, the value of the correct class is much higher than the incorrect class, so cross entropy will be low.

import tensorflow as tf

y_true = tf.convert_to_tensor([[0, 1]], "float32")
y_pred = tf.convert_to_tensor([[5, 7]], "float32")

ce_logits_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
ce_probs_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=False)

print(ce_logits_fn(y_true, y_pred))
# tf.Tensor(0.12692805, shape=(), dtype=float32)

print(ce_probs_fn(y_true, tf.nn.softmax(y_pred)))
# tf.Tensor(0.126928, shape=(), dtype=float32)
jkr
  • 17,119
  • 2
  • 42
  • 68
  • That clarifies a lot, thanks! I find `ce_logits_fn(y_true, tf.nn.softmax(y_pred))` outputs a different result. I guess softmax of softmax is not idempotent. If the input so happens to be like [0.1,0.9], should I set `from_logits=True`? Should I check each input and if any input doesn't look like probability, I assume the inputs didn't go through softmax? – Gqqnbig Apr 04 '21 at 14:37
  • Well, my real concern is that is it ok to alway turn on `from_logits=True`? In case my model already have a softmax layer, does keeping `from_logits=True` affect training? – Gqqnbig Apr 04 '21 at 14:41
  • Because your model is giving probabilities, you should use `from_logits=False`. Newer versions (not sure since when) of `tf.keras` will implicitly [use the logits if they are available](https://github.com/tensorflow/tensorflow/blob/85c8b2a817f95a3e979ecd1ed95bff1dc1335cff/tensorflow/python/keras/backend.py#L4835-L4839). Interesting point that softmax of probabilities does not return the same values... I would be very curious to know why. – jkr Apr 04 '21 at 14:48
  • If you were to use `from_logits=True` on probabilities, then your loss would be wrong, and your model would not learn correctly. – jkr Apr 04 '21 at 14:50