I have a deep learning classification problem with 17 classes and I am working in Pytorch. The architecture includes the crossEntropy loss, implemented after a linear layer.
I believe that, normally, one computes a softmax activation and interprets as probablity for the corresponding output classes. But softmax is a monotonic function and it seems that, if I just want the most probable class, I can simply choose the class with the maximum score after the linear layer, leaving the softmax out.
Given that softmax is the default, widely used activation in classification problems, I wonder if I am missing something important here. Can anyone guide me?
Note that I have googled a large number of sites but, as far as I could understand, none answering this basic question (although there was a lot of information that was provided).
Thanks