How Tensorflow & Keras go from one-hot encoded outputs to class predictions for calculating the accuracy?

Question

I'm wondering how the Accuracy metrics in TensorFlow/Keras calculates if a given input matches the expected prediction, or, in other words, how it determines the predicted number of the net.

Example 1:

Output: [0, 0, 0.6], expected output: [0, 0, 1]

I assume the 0.6 is just rounded to 1, correct? Or it is seen as the only number greater than 0.5, hence it's the predicted number.

But, if so, then consider Example 2:

Output: [0.6, 2, 0.1], expected output: [1, 0, 0]

I know, such an output is not possible with softmax which would be the default choice here. But it would be possible with other activation functions.

Is here now just the greatest number "extracted" and taken as the prediction? So 2, what would be a false prediction.

Example 3:

Output: [0.1, 0, 0.2], expected output: [0, 0, 1]

Since every number in output is less than 0.5, I'd guess that the accuracy-calculator would see this output as [0, 0, 0], so also not a correct prediction. Is that correct?

If my preceding assumptions are correct, then would be the rule as follows?

Every number less than 0.5 is a 0 in terms of prediction, and from the numbers greater than 0.5 or equal to 0.5 I choose the greatest one. The greatest one then represents the predicted class.

If that would be so, then accuracy can be only used for classifications with only one corresponding correct class (so e.g. there can't be an expected output like [1, 0, 1])?

desertnaut · Accepted Answer · 2020-05-02T12:41:20.197

There are several issues with your question.

To start with, we have to clarify the exact setting; so, in single-label multi-class classification (i.e. a sample can belong to one and only one class) with one-hot encoded samples (and predictions), all the examples you show here are invalid: the elements of the output array not only are less than 1, but they have to add up to 1 (since they are considered as probabilities).

Having clarified that, it's straightforward to see that there is not any need to threshold to any value (e.g. to 0.5, as you suggest here); you just take the argmax. So, [0.25. 0.35. 0.4] becomes [0, 0, 1].

From this example, it should also be apparent that, in such a setting, there can be cases where no individual element is greater than 0.5, and this is very natural. It seems that new practitioners are prone to the confusion that 0.5 plays some special role here as it does in binary classification only; but in multi-class classification, 0.5 does not play any special role anymore; the equivalent "threshold" in (single-label) multi-class settings is 1/n, where n is the number of classes (in the example here 0.33, since we have 3 classes). It's easy to see that, given the constraint that the array elements should be less that 1 and adding up to 1, there will always be one entry greater than 0.33. But simply taking the argmax will do the job, without any need for intermediate thresholding.

I know, such an output is not possible with softmax which would be the default choice here. But it would be possible with other activation functions.

As long as we keep the discussion to meaningful classification settings (and not just doing some crazy computational experiments), this is not correct; the only other possible activation function for classification is the sigmoid, which again will give results less than 1 (although no more adding up to 1). You can of course ask for a linear (or even relu) activation in the final layer; your program will not crash, but this doesn't mean that you are doing anything meaningful from a modeling perspective, which I trust is what you are actually interested in here.

then accuracy can be only used for classifications with only one corresponding correct class (so e.g. there can't be an expected output like [1, 0, 1])?

This is a completely different context altogether, called multi-label multi-class classification (i.e a sample can belong to more than one class). It should be clear by now that results like [1, 0, 1] can never occur in the case of single-label multi-class case (i.e if there are not such cases already in your true labels). See What are the measure for accuracy of multilabel data? for the general case, and How does Keras handle multilabel classification? (hint: with sigmoid).

Thank you very much, this helped me truly a lot! Actually, I was thinking of final layer activation functions like `linear`, just to understand the way Accuracy works (although that doesn't make sense). However, so I assume now, that in binary classification the 0.5 is important in the sense that if the output is less, then it is a `0` and otherwise a `1` in classification terms. Good to know, that this treatment of `0.5` is only done in binary classification, I wasn`t aware of that distinguishment between singlelable, multilabel, binary etc. Anyway, I got it now, thank you a lot. — mathematics-and-caffeine, May 02 '20 at 12:05
@LukasNießen You are very welcome; as you'll see, 0.5 may find again some special importance in the case of *multi-label* (but I had to end the answer at some point). — desertnaut, May 02 '20 at 12:36

Lowry · Answer 2 · 2020-05-02T14:15:56.007

1

Accuracy in Keras uses by default is categorical accuracy which seems to be the appropriate case for you. It calculates the mean accuracy rate across all predictions for multiclass classification problems.

The code for it is the following:

def categorical_accuracy(y_true, y_pred):
    return K.mean(K.equal(K.argmax(y_true, axis=-1), K.argmax(y_pred, axis=-1)))

Meaning that example 1

[0, 0, 0.6]

will be

[0, 0, 1]

Example 2

[0.6, 2, 0.1]

will be

[0, 1, 0]

Example 3

[0.1, 0, 0.2]

will be

[0, 0, 1]

These are then compared to the targets

[0, 0, 1], [1, 0, 0], [0, 0, 1]

and if you predicted these three examples would give out a mean of those so your accuracy would be

0.66

edited May 02 '20 at 14:15

answered May 02 '20 at 11:02

Lowry

428
5
12

Thanks a lot for this excellent answer! I got one more question: Thus, you can't use Accuracy if you have also outputs like `[0, 0 ,0]` with no correct classification? – mathematics-and-caffeine May 02 '20 at 11:54
1

Argmax will always give you the index with highest value, so no. – Lowry May 02 '20 at 11:59
@LukasNießen the question is purely academic; provided that you have correctly set up your multi-class problem, you can never have such outputs. – desertnaut May 02 '20 at 14:46

How Tensorflow & Keras go from one-hot encoded outputs to class predictions for calculating the accuracy?

2 Answers2