Why doesn't tensorflow use `from_logits` automatically where needed?

Question

According to Andrew Ng, when using tensorflow for classification it's better to use from_logits. I.e. instead of:

model = Sequential([
   ...,
   Dense(units=1, activation='sigmoid')
])
model.compile(..., BinaryCrossentropy())

the advice is to use

model = Sequential([
   ...,
   Dense(units=1, activation='linear')
])
model.compile(..., BinaryCrossentropy(from_logits = True))

(and similar for multiclass).

As far as I understand, the only reason for doing so is to improve numeric stability.

This makes me wonder: why doesn't tensorflow do this transformation automatically? Surely the compile method must be able to see that a sigmoid activation function is used for the last layer, and then replace it with linear and effectively set from_logits = True internally? This would also allow TF to keep a consistent interface, e.g. make .predict working as expected.

Is there any reason why TF would not want to do this? E.g. are there use cases where the first example is preferred over the second? Is there a performance penalty?

The quesiton is more like regarding the API design choice of Keras. I think it's more fitted to the discussion forum. https://discuss.tensorflow.org/ — Innat, Oct 02 '22 at 14:30

Why doesn't tensorflow use `from_logits` automatically where needed?

0 Answers0