1

I want to evaluate a model and at the same time also capture the activations of the penultimate layer. I used this answer for a solution. I access the penultimate activations with pen_ulti_activs = layer_outs[-2].

But to double check whether that solution actually worked I put an assert in my code to verify that the activations from functor actually match the activations of model.predict by comparing the last layer activations returned from functor with the array returned from model.predict. The assert fails though. So I guess I am misunderstanding how the linked answer is intended to be used.

from keras import backend as K


def evaluate_model(model, test_gen):


    inp = model.input                                           # input placeholder
    outputs = [layer.output for layer in model.layers]          # all layer outputs
    functor = K.function([inp, K.learning_phase()], outputs )   # evaluation function


    for inputs, targets in test_gen:

        layer_outs = functor([inputs, 1.])


        predictions = layer_outs[-1]

        predictions_ = model.predict(inputs)

        assert(np.allclose(predictions, predictions_))

So: Why are predictions and predictions_ not equal? Shouldn't model.predict return the same as the outputs of the last layer? After all mode.predict returns the outputs of the last layer.

lo tolmencre
  • 3,804
  • 3
  • 30
  • 60

1 Answers1

1

You don't give much details about your model so one can only guess. One possibility is that you are doing classification using softmax crossentropy, in which case the last layer typically output (unormalized) logits whereas predict() applies softmax to this output to return normalized probabilities.

P-Gn
  • 23,115
  • 9
  • 87
  • 104
  • Then that must be the reason. So then `functor` does in fact not actually return the layer outputs but the layer "pre-activations"...? Meaning the activations before the activation function is applied... – lo tolmencre Apr 03 '19 at 20:42
  • Technically, the output has no activation -- it is `predict` that applies an extra `softmax` to the output. – P-Gn Apr 04 '19 at 07:13
  • Why is that technically the case? The activation functions of all other layers are also part of the output computation for these layers. Why would the last layer be different? Or am I missing something? – lo tolmencre Apr 04 '19 at 09:32