I'd like a simple example to illustrate how gradient clipping via clip_grad_norm_ works. From this post, I found that if the norm of a gradient is greater than a threshold, then it simply takes the unit vector of the gradient and multiplies it with with threshold. That's what I tried
v = torch.rand(5)*1000
v_1 = v.clone()
torch.nn.utils.clip_grad_norm_(v_1, max_norm=1.0, norm_type=2)
print(v, v_1)
(tensor([381.2621, 935.3613, 664.9132, 840.0740, 443.0156]),
tensor([381.2621, 935.3613, 664.9132, 840.0740, 443.0156]))
I'd have thought it would do v/torch.norm(v, p=2) * 2
which should give me tensor([0.2480, 0.6083, 0.4324, 0.5463, 0.2881])
It doesn't seem to do anything. I thought the max_norm was the threshold value (the pytorch documentation wasn't very clear on this. This post wasn't too helpful either.