The following code is just a template, you see the following pattern a lot in AI codes.
I have a specific question about loss.backward()
. in the following code we have a model
, as we pass model.parameters()
to optimizer
so optimizer
and model
are some how connected. But there is no connection betweenloss_fn
and model
or loss_fn
and optimizer
. So how exactly loss.backward()
works?
I mean, consider I add a new instance of MSELoss
like loss_fn_2 = torch.nn.MSELoss(reduction='sum')
to the code and exactly do the same loss_2 = loss_fn_2(y_pred, y)
and loss_2.backward()
How pytorch recognize that loss_2
is not related to model
and only loss
is related?
Consider a scenario, I would like to have (model_a
or loss_fn_a
and optimizer_a
) and (model_b
or loss_fn_b
and optimizer_b
) so I would like to make *_a
and *_b
isolated from each other
import torch
import math
# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)
# Prepare the input tensor (x, x^2, x^3).
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)
# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
torch.nn.Linear(3, 1),
torch.nn.Flatten(0, 1)
)
loss_fn = torch.nn.MSELoss(reduction='sum')
# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use RMSprop; the optim package contains many other
# optimization algorithms. The first argument to the RMSprop constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-3
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
for t in range(2000):
# Forward pass: compute predicted y by passing x to the model.
y_pred = model(xx)
# Compute and print loss.
loss = loss_fn(y_pred, y)
if t % 100 == 99:
print(t, loss.item())
# Before the backward pass, use the optimizer object to zero all of the
# gradients for the variables it will update (which are the learnable
# weights of the model). This is because by default, gradients are
# accumulated in buffers( i.e, not overwritten) whenever .backward()
# is called. Checkout docs of torch.autograd.backward for more details.
optimizer.zero_grad()
# Backward pass: compute gradient of the loss with respect to model
# parameters
loss.backward()
# Calling the step function on an Optimizer makes an update to its
# parameters
optimizer.step()
linear_layer = model[0]
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')