TL;DR - using a generator fails, using a list succeeds. why?
I am trying to change my model's parameters manually like so:
(1st code, works)
delta = r_t + gamma * expected_reward_from_t1.data - expected_reward_from_t.data
negative_expected_reward_from_t = -expected_reward_from_t
self.critic_optimizer.zero_grad()
negative_expected_reward_from_t.backward()
for i, p in enumerate(self.critic_nn.parameters()):
if not p.requires_grad:
continue
p.grad[:] = delta.squeeze() * discount * p.grad
self.critic_optimizer.step()
and it seems to converge on the correct result 100% of the time.
However,
When attempting to use a function like so:
(2nd code, fails)
def _update_grads(self,delta, discount):
params = self.critic_nn.parameters()
for i, p in enumerate(params):
if not p.requires_grad:
continue
p.grad[:] = delta.squeeze() * discount * p.grad
and then
delta = r_t + gamma * expected_reward_from_t1.data - expected_reward_from_t.data
negative_expected_reward_from_t = -expected_reward_from_t
self.critic_optimizer.zero_grad()
negative_expected_reward_from_t.backward()
self._update_grads(delta=delta,
discount=discount)
self.critic_optimizer.step()
which seems like the only thing I've done was put self.critic_nn.parameters()
into a temporary local variable params
,
Now the network does not converge.
(3rd code, again, works)
When replacing in the method _update_grads
: params = self.critic_nn.parameters()
with params = list(self.critic_nn.parameters())
Now again, convergence is restored.
This seems like a referencing issue, which, in PyTorch, I do not completely understand. I don't seem to understand fully what returns from parameters()
The question: Why do the 1st and 3rd codes work, but the 2nd one doesn't?