Some answers explained well, but I'd like to give a specific example to explain the mechanism.
Suppose we have a function : z = 3 x^2 + y^3.
The updating gradient formula of z w.r.t x and y is:

initial values are x=1 and y=2.
x = torch.tensor([1.0], requires_grad=True)
y = torch.tensor([2.0], requires_grad=True)
z = 3*x**2+y**3
print("x.grad: ", x.grad)
print("y.grad: ", y.grad)
print("z.grad: ", z.grad)
# print result should be:
x.grad: None
y.grad: None
z.grad: None
Then calculating the gradient of x and y in current value (x=1, y=2)

# calculate the gradient
z.backward()
print("x.grad: ", x.grad)
print("y.grad: ", y.grad)
print("z.grad: ", z.grad)
# print result should be:
x.grad: tensor([6.])
y.grad: tensor([12.])
z.grad: None
Finally, using SGD optimizer to update the value of x and y according the formula:

# create an optimizer, pass x,y as the paramaters to be update, setting the learning rate lr=0.1
optimizer = optim.SGD([x, y], lr=0.1)
# executing an update step
optimizer.step()
# print the updated values of x and y
print("x:", x)
print("y:", y)
# print result should be:
x: tensor([0.4000], requires_grad=True)
y: tensor([0.8000], requires_grad=True)