1

I want to calculate L1 loss in a neural network, I came across this example at https://discuss.pytorch.org/t/simple-l2-regularization/139/2, but there are some errors in this code.

Is this really how to calculate L1 Loss in a NN or is there a simpler way?

l1_crit = nn.L1Loss()
reg_loss = 0
for param in model.parameters():
    reg_loss += l1_crit(param)

factor = 0.0005
loss += factor * reg_loss

Is this equivalent in any way to simple doing:

loss = torch.nn.L1Loss()

I assume not, because I am not passing along any network parameters. Just checking if there isn existing function to do this.

dorien
  • 5,265
  • 10
  • 57
  • 116
  • That code doesn't even work. `l1_crit` expects two arguments, not just one. And why would the loss be calculated from the model's parameters? The second one is obviously not the same as the first one, as the first line is the same (with different `size_average` values, which is deprecated anyway), but the rest is omitted. – Michael Jungo Jun 16 '20 at 08:36
  • Sorry I got this code from https://discuss.pytorch.org/t/simple-l2-regularization/139/2. How do I make it work, I'll remove the depreciated argument. But can I feed the param to the function, no right? – dorien Jun 16 '20 at 08:45
  • 1
    @dorien the link you provided is about L1 regularization. Is that what you want? This is different than calculating L1Loss at the end of network. Plus if you really want L1 regularization, you should replace (or rather fill in) target with zeros – Proko Jun 16 '20 at 08:50
  • Oh indeed I want to include L1 regularization. Can you be more specific how I should replace the target with zeros? – dorien Jun 16 '20 at 09:01
  • I know for L2 I can do something like optimizer = torch.optim.Adam(model.parameters(), lr=1e-4, weight_decay=1e-5). but for L1 seems much more complicated – dorien Jun 16 '20 at 09:01
  • 1
    @dorien with L1 you basically wants you parameters to be sparse, so you want to penalize them to be zero `l1_crit(param, target=torch.zeros_like(param), size_average=False)` but in general you don't need to use `L1Loss`. You can also just use norm, please see https://stackoverflow.com/questions/44641976/in-pytorch-how-to-add-l1-regularizer-to-activations I don't think there is a simpler way to do this that norm. Of course in the link, the use case is different, you will take all parameters of the model – Proko Jun 16 '20 at 09:17
  • Ah @Proko, that first answer is just what I was looking for (with norm). I'm preparing a lecture and wanted to know the most easy way to implement the L1 and L2 terms in the loss in PyTorch to enable regularisation. – dorien Jun 17 '20 at 02:18

1 Answers1

1

If I am understanding well, you want to compute the L1 loss of your model (as you say in the begining). However I think you might got confused with the discussion in the pytorch forum.

From what I understand, in the Pytorch forums, and the code you posted, the author is trying to normalize the network weights with L1 regularization. So it is trying to enforce that weights values fall in a sensible range (not too big, not too small). That is weights normalization using L1 normalization (that is why it is using model.parameters()). Normalization takes a value as input and produces a normalized value as output. Check this for weights normalization: https://pytorch.org/docs/master/generated/torch.nn.utils.weight_norm.html

On the other hand, L1 Loss it is just a way to determine how 2 values differ from each other, so the "loss" is just measure of this difference. In the case of L1 Loss this error is computed with the Mean Absolute Error loss = |x-y| where x and y are the values to compare. So error compute takes 2 values as input and produces a value as output. Check this for loss computing: https://pytorch.org/docs/master/generated/torch.nn.L1Loss.html

To answer your question: no, the above snippets are not equivalent, since the first is trying to do weights normalization and the second one, you are trying to compute a loss. This would be the loss computing with some context:

sample, target = dataset[i]
target_predicted = model(sample)
loss = torch.nn.L1Loss()
loss_value = loss(target, target_predicted)
JVGD
  • 657
  • 9
  • 15
  • 1
    She actually confirmed it was about L1 regularization, she just happended to have used L1Loss instead of norm – Proko Jun 16 '20 at 10:34