Let's image a network with 2 layers (X1, X2). I want to use L1 Norm on X1 and then do (loss + L1).backward() on X1. X2 is still trained but without the regularization. My goal is to make X1 become sparse.
I have already tried this, unfortunately the regularization is applied to all layers, even though it only uses parameters from one layer.
I have also tried to freeze X1, do loss.backward() and then freeze X2 to apply do loss.backward(), including regularization. Like this:
for parameter in model.X1.parameters():
parameter.requires_grad = False
loss.backward(retain_graph=True)
for parameter in model.X1.parameters():
parameter.requires_grad = True
for parameter in model.X2.parameters():
parameter.requires_grad = False
loss += l1_regularization
loss.backward()
optimizer.step()
The outcome is not as expected though. X2 does not get updated at all anymore and the values in X1 seem to be too low (all weights become very close to zero).
What am I doing wrong and is there any way to reach my goal? Thanks for your help