0

Let's image a network with 2 layers (X1, X2). I want to use L1 Norm on X1 and then do (loss + L1).backward() on X1. X2 is still trained but without the regularization. My goal is to make X1 become sparse.

I have already tried this, unfortunately the regularization is applied to all layers, even though it only uses parameters from one layer.

I have also tried to freeze X1, do loss.backward() and then freeze X2 to apply do loss.backward(), including regularization. Like this:

for parameter in model.X1.parameters():
     parameter.requires_grad = False

loss.backward(retain_graph=True)

for parameter in model.X1.parameters():
     parameter.requires_grad = True
for parameter in model.X2.parameters():
     parameter.requires_grad = False


loss += l1_regularization
loss.backward()
optimizer.step()

The outcome is not as expected though. X2 does not get updated at all anymore and the values in X1 seem to be too low (all weights become very close to zero).

What am I doing wrong and is there any way to reach my goal? Thanks for your help

enbo
  • 3
  • 2

1 Answers1

0

Your second implementation should work. However, it doesn't show the part where you set requires_grad = True for X2 afterwards (or at the start where you freeze X1). If that part is indeed missing in your code, then from the second loop onward, X2 will not get trained.

Kroshtan
  • 637
  • 5
  • 17
  • Thank you for your response! I tried it with two optimizers, but I get the following error message: `RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!`. And where exactly should I add X2 reguires_grad=True? Because default it should be on. – enbo Jan 20 '22 at 14:35
  • Yes, X2.requires_grad is set to True by default, but not after your first loop iteration, then it is set to False and never set to True again. With respect to the two optimizers, the second call to loss.backward is probably the issue. I'll remove that for now. – Kroshtan Jan 20 '22 at 14:42
  • The idea was to set X2.requires_grad=False to only update the grads for X1. I just tried to set it on True before and after the second loss.backward(), but it still doesnt change anything. This also makes sense, because as i understand it loss.backward updates the grads and optimizer.step() applies them – enbo Jan 20 '22 at 15:06
  • I just realized what you meant with your answer. It is absolutely correct and now it works. Thank you very much! – enbo Jan 20 '22 at 20:47