Keras - how to get unnormalized logits instead of probabilities

Question

I am creating a model in Keras and want to compute my own metric (perplexity). This requires using the unnormalized probabilities/logits. However, the keras model only returns the softmax probabilties:

model = Sequential()
model.add(embedding_layer)
model.add(LSTM(n_hidden, return_sequences=False))
model.add(Dropout(dropout_keep_prob))
model.add(Dense(vocab_size))
model.add(Activation('softmax'))
optimizer = RMSprop(lr=self.lr)

model.compile(optimizer=optimizer, 
loss='sparse_categorical_crossentropy')

The Keras FAQ have a solution to get the output of intermediate layers here. Another solution is given here. However, these answers store the intermediate outputs in a different model which is not what I need. I want to use the logits for my custom metric. The custom metric should be included in the model.compile() function such that it's evaluated and displayed during training. So I don't need the output of the Dense layer separated in a different model, but as part of my original model.

In short, my questions are:

When defining a custom metric as outlined here using def custom_metric(y_true, y_pred), does the y_pred contain logits or normalized probabilities?
If it contains normalized probabilities, how can I get the unnormalized probabilities, i.e. the logits output by the Dense layer?

So you want to keep this model or you can change it? Do you want to keep sparse_categorical_crossentropy as loss? — Marcin Możejko, Oct 31 '17 at 13:39
What would be an alternative yielding the same final result? — Lemon, Oct 31 '17 at 13:42

Lemon · Accepted Answer · 2017-11-02T07:58:54.333

8

I think I have found a solution

First, I change the activation layer to linear such that I receive logits as outlined by @loannis Nasios.

Second, to still get the sparse_categorical_crossentropy as a loss function, I define my own loss function, setting the from_logits parameter to true.

model.add(embedding_layer)
model.add(LSTM(n_hidden, return_sequences=False))
model.add(Dropout(dropout_keep_prob))
model.add(Dense(vocab_size))
model.add(Activation('linear'))
optimizer = RMSprop(lr=self.lr)


def my_sparse_categorical_crossentropy(y_true, y_pred):
    return K.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)

model.compile(optimizer=optimizer,loss=my_sparse_categorical_crossentropy)

edited Nov 02 '17 at 07:58

answered Oct 31 '17 at 15:07

Lemon

1,394
3
14
24

1

So, you are explicitly acknowledging a *helpful* answer, but do not even care to upvote it?? – desertnaut Nov 01 '17 at 17:04
Not sure why you're wrongly accusing another user. I upvoted the answer of loannis Naosis right away. However, someone else must have downvoted it, so my upvote is not displayed. – Lemon Nov 02 '17 at 07:37
2

and how do you get the softmax probabilities in the end? – ricoms Feb 28 '19 at 19:14

score 3 · Answer 2 · answered Oct 31 '17 at 14:36

3

try to change last activation from softmax to linear

model = Sequential()
model.add(embedding_layer)
model.add(LSTM(n_hidden, return_sequences=False))
model.add(Dropout(dropout_keep_prob))
model.add(Dense(vocab_size))
model.add(Activation('linear'))
optimizer = RMSprop(lr=self.lr)

model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy')

answered Oct 31 '17 at 14:36

Ioannis Nasios

8,292
4
33
55

But I do need softmax probabilities in the end, so the final model outputs must be probabilities – Lemon Oct 31 '17 at 14:38
Why does changing the activation to linear get the logits rather than the probabilities? – JobHunter69 Jun 13 '19 at 19:18
3

@Goldname because linear activation means just pass on the values without applying any activations. Logits in ML just refers to unnormalized log probabilities. If you change the activation to softmax, it takes the logits and passes them through the softmax function, which returns normalised probabilities. [More info] (https://stackoverflow.com/questions/41455101/what-is-the-meaning-of-the-word-logits-in-tensorflow?noredirect=1&lq=1) – layser Aug 15 '19 at 17:51
I don't think this works. The optimizer doesn't automatically pick up that the activation is linear and not softmax. – Björn Lindqvist Jun 17 '20 at 08:14

Daniel Möller · Answer 3 · 2017-10-31T16:03:49.780

0

You can make a model for training and another for predictions.

For training, you can use the functional API model and simply take a part of the existing model, leaving the Activation aside:

model = yourExistingModelWithSoftmax 
modelForTraining = Model(model.input,model.layers[-2].output)

#use your loss function in this model:
modelForTraining.compile(optimizer=optimizer,loss=my_sparse_categorical_crossentropy, metrics=[my_custom_metric])

Since you got one model as a part of another, they both will share the same weights.

When you want to train, use modelForTraining.fit()
When you want to predict probabilities, use model.predict().

edited Oct 31 '17 at 16:03

answered Oct 31 '17 at 15:06

Daniel Möller

84,878
18
192
214

I think the solution I found above is even easier. However, I still don't know what y_pred is in Keras when creating the custom build metric. Is it always the output of the last network layer? – Lemon Oct 31 '17 at 15:09
1

Yes, `y_pred` is the output of the model, while `y_true` is the true data you pass to `fit`. --- My solution considers that you will need the softmax probabilities later. – Daniel Möller Oct 31 '17 at 15:12
Can't I still work with softmax probabiltites later as long as I always add a softmax? So for example, when calling model.predict(input) the output would be unnormalized probabilities (=logits). I would get the needed softmax probabiltities when adding probs = K.nn.softmax(model.predict(input)). Correct? – Lemon Oct 31 '17 at 15:22
1

Regarding your answer: I have to use the sparse_categorical_cross_entropy as a loss function during training. When training only the first part of the model, its using the unnormalized probabiltities as an input to the loss function which would yield wrong results – Lemon Oct 31 '17 at 15:26
You can only get the metrics from the outputs, so you need non normalized outputs. You can normalize them inside your custom loss function if you want. In all cases, you will find later that you will need a model to perform the softmax. You will end up in a variation of this answer plus your answer. – Daniel Möller Oct 31 '17 at 16:01

Keras - how to get unnormalized logits instead of probabilities

3 Answers3

Linked