1

From the Pytorch documentation, CrossEntropyLoss combines LogSoftMax and NLLLoss together in one single class

But I am curious; what happens if we use both CrossEntropyLoss for criterion and LogSoftMax in my classifier:

model_x.fc = nn.Sequential (nn.Linear(num_ftrs, 2048, bias=True), nn.ReLU(), 
                               nn.Linear(2048, 1024 ), nn.ReLU(),
                               nn.Linear(1024 ,256), nn.ReLU(),
                                nn.Linear(256 ,128), nn.ReLU(),
                               nn.Linear(128, num_labels),nn.LogSoftmax(dim = 1))

criterion = nn.CrossEntropyLoss()

Then if i have saved a trained model using the code above, how can I check the criterion used by the saved model?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Curerious
  • 31
  • 5

2 Answers2

2

TL;DR: You will decrease the expressivity of the model because it only can produce relatively flat distribution.

What you suggest in the snippet actually means applying the softmax normalization twice. This will give you a distribution with the same rank of probabilities, but it will be much flatter and it will prevent the model from using a low-entropy output distribution. The output of the linear layer can be in theory any number. In practice, the logits are both positive and negative numbers, which allows producing spiky distributions. After softmax, you have probabilities between 0 and 1, so log-softmax will give you negative numbers.

Typically, models are saved without the loss function. Unless you explicitly saved the loss as well, there is no way to find it out.

Jindřich
  • 10,270
  • 2
  • 23
  • 44
  • Thank you Jindrich! Though I am not familiar with the term 'expressivity of the model' and 'flat distribution' as I am still new in this field. However I will look those up and study them. Thanks again! – Curerious Oct 27 '20 at 01:26
1

You want your model output to be over a distribution where there is a clear boundary/threshold between different classes. Applying the CrossEntropyLoss on LogSoftmax reduces the effective range of the output of the model and it can be argued adversely affects the rate at which the model learns.

Just save the losses in a dictionary along with your state_dict or write it to a text file.

Shirsho
  • 313
  • 2
  • 6