2

I have gone through this, and this post. However, my question is very specific: If the output of my model.predict() function for a classification problem with class labelled 0 and 1 is something like:

array([[0.5147758 ],
       [0.48530805],
       [0.5122566 ],
       [0.4839405 ],
       [0.49831972],
       [0.4886117 ],
       [0.5130876 ],
       [0.50388396]], dtype=float32)

and I'm using binary_crossentropy loss with the last layer as:

Dense(1, activation='sigmoid')

Then each entry in the above output denotes the probability of occurrence of class 0 or class 1 ?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Saurabh Verma
  • 6,328
  • 12
  • 52
  • 84

2 Answers2

3

sigmoid activation will output values between 0 and 1 and you have only one unit of neuron in your Dense layer. Binary crossentropy loss will try to maximize the probability in favour of one of the two classes available. So, to be precise, the output in your case is actually the probability of occurence of class 1. For class 0 probabilities value, you will have to do 1 - output.

Other type of commonly used activation in final layer is 'softmax'. This activation will give you probabilities of each class of occurence, and hence the number of units in your final layer will be equal to the number of classes. In this setup, we use categorical crossentropy loss.

Prasad
  • 5,946
  • 3
  • 30
  • 36
1

each entry in the above output denotes the probability of occurrence of class 0 or class 1 ?

The conventional interpretation of this output is as probabilities for the output belonging to class 1.

Intuitively, it maybe convenient to imagine them as trying to "replicate" the actual binary labels (0/1), so the closer they are to 1.0 the higher the probability of class 1 (and vice versa). Roughly speaking, this is actually the definition of cross entropy loss, used here:

Cross-Entropy

Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high loss value. A perfect model would have a log loss of 0.

enter image description here


was wondering whether there is some documentation where this is specifically mentioned

The reason you cannot find it explicitly in Keras documentation is that it is much more general, having to do with the fundamental ideas of log loss & binary classification, and it has nothing to do with Keras in particular. These threads may be helpful for convincing you:

Loss & accuracy - Are these reasonable learning curves?

How does Keras evaluate the accuracy?

Community
  • 1
  • 1
desertnaut
  • 57,590
  • 26
  • 140
  • 166