Due to the nature of the softmax function and the manner in which the network is trained, you need the 4th class.
Let's see a concrete example: You train your network to distinguish between apples, oranges and bananas. However, you somehow get the photo of a plum.
You might be surprised at first sight but you need the other class in your dataset. There is no guarantee that using a thresholding will help you eliminate the other class.
You may expect the following two cases:
- The output probability is guaranteed to be
1/N
for an unknown class, given that you are testing on an unknown N+1 class.
- A certain threshold beyond which (like you assumed)
< 90%
it is not class.
Assume the next cases:
- What if you have a case in which an apple really looks like an
orange, and your model correctly predicts 40% apple, 30% orange, 30%
banana, but since you applied your threshold a correctly identified
apple (True Positive) is eliminated? A simple case in which you eliminate the good output of your network
- You can still have a 91% assignation to a class, although the new 'fruit' arrival is not part of your dataset; this is due to the inherent calculations and the manner in which softmax works.
Personal Experience: I have once trained a network to distinguish between many types of traffic signs. Out of pure curiosity, I gave it an example of one living room chairs. I expected the same thing like you(the thresholding), but much to my surprise, it was 85%
"Yield Way".