As you read, the model definition is written something like this
model = tf.keras.Sequential([
layers.Embedding(max_features + 1, embedding_dim),
layers.Dropout(0.2),
layers.GlobalAveragePooling1D(),
layers.Dropout(0.2),
layers.Dense(1)])
And the data set used in that tutorials is a binary classification zero
and one
. By not defining any activation to the last layer of the model, the original author wants to get the logits
rather than probability. And that why they used the loss
function as
model.compile(loss=losses.BinaryCrossentropy(from_logits=True),
...
Now, if we set the last layer activation as sigmoid
(which usually pick for binary classification), then we must set from_logits=False
. So, here are two option to chose from:
with logit: True
We take the logit
from the last layer and that why we set from_logits=True
.
model = tf.keras.Sequential([
layers.Embedding(max_features + 1, embedding_dim),
layers.Dropout(0.2),
layers.GlobalAveragePooling1D(),
layers.Dropout(0.2),
layers.Dense(1, activation=None)])
model.compile(loss=losses.BinaryCrossentropy(from_logits=True),
optimizer='adam',
metrics=['accuracy'])
history = model.fit(
train_ds, verbose=2,
validation_data=val_ds,
epochs=epochs)
7ms/step - loss: 0.6828 - accuracy: 0.5054 - val_loss: 0.6148 - val_accuracy: 0.5452
Epoch 2/3
7ms/step - loss: 0.5797 - accuracy: 0.6153 - val_loss: 0.4976 - val_accuracy: 0.7406
Epoch 3/3
7ms/step - loss: 0.4664 - accuracy: 0.7734 - val_loss: 0.4197 - val_accuracy: 0.8096
without logit: False
And here we take the probability
from the last layer and that why we set from_logits=False
.
model = tf.keras.Sequential([
layers.Embedding(max_features + 1, embedding_dim),
layers.Dropout(0.2),
layers.GlobalAveragePooling1D(),
layers.Dropout(0.2),
layers.Dense(1, activation='sigmoid')])
model.compile(loss=losses.BinaryCrossentropy(from_logits=False),
optimizer='adam',
metrics=['accuracy'])
history = model.fit(
train_ds, verbose=2,
validation_data=val_ds,
epochs=epochs)
Epoch 1/3
8ms/step - loss: 0.6818 - accuracy: 0.6163 - val_loss: 0.6135 - val_accuracy: 0.7736
Epoch 2/3
7ms/step - loss: 0.5787 - accuracy: 0.7871 - val_loss: 0.4973 - val_accuracy: 0.8226
Epoch 3/3
8ms/step - loss: 0.4650 - accuracy: 0.8365 - val_loss: 0.4195 - val_accuracy: 0.8472
Now, you may wonder, why this tutorial uses logit
(or no activation to the last layer)? The short answer is, it generally doesn't matter, we can choose any option. The thing is, there is a chance of numerical instability in the case of using from_logits=False
. Check this answer for more details.