1

TL;DR - using a generator fails, using a list succeeds. why?

I am trying to change my model's parameters manually like so:

(1st code, works)

       delta = r_t + gamma * expected_reward_from_t1.data - expected_reward_from_t.data

        negative_expected_reward_from_t = -expected_reward_from_t
        self.critic_optimizer.zero_grad()
        negative_expected_reward_from_t.backward()

        for i, p in enumerate(self.critic_nn.parameters()):
             if not p.requires_grad:
                 continue
             p.grad[:] = delta.squeeze() * discount * p.grad

        self.critic_optimizer.step()

and it seems to converge on the correct result 100% of the time.

However,

When attempting to use a function like so:

(2nd code, fails)

def _update_grads(self,delta, discount):
    params = self.critic_nn.parameters()
    for i, p in enumerate(params):
        if not p.requires_grad:
            continue
        p.grad[:] = delta.squeeze() * discount * p.grad

and then

       delta = r_t + gamma * expected_reward_from_t1.data - expected_reward_from_t.data

        negative_expected_reward_from_t = -expected_reward_from_t
        self.critic_optimizer.zero_grad()
        negative_expected_reward_from_t.backward()
        self._update_grads(delta=delta,
                           discount=discount)

        self.critic_optimizer.step()

which seems like the only thing I've done was put self.critic_nn.parameters() into a temporary local variable params,

Now the network does not converge.

(3rd code, again, works)

When replacing in the method _update_grads: params = self.critic_nn.parameters() with params = list(self.critic_nn.parameters())

Now again, convergence is restored.


This seems like a referencing issue, which, in PyTorch, I do not completely understand. I don't seem to understand fully what returns from parameters()


The question: Why do the 1st and 3rd codes work, but the 2nd one doesn't?

Gulzar
  • 23,452
  • 27
  • 113
  • 201
  • 1
    In the 2nd example, your code `p.grad[:]...` seems to be out of for loop. Did it unintentionally happen while pasting on stackoverflow? – kHarshit Feb 20 '19 at 18:04
  • @kHarshit yes. edited now. – Gulzar Feb 20 '19 at 18:22
  • @Gulzar It returns a generator as you can see from [here](https://github.com/pytorch/pytorch/blob/b2dde4386adf110262203a57d8cc552309be0e27/torch/nn/modules/module.py#L786). At first "the network does not converge" sounds more like an issue with your algorithm rather than a programming problem. The source code doesn't reveal any side effects of exhausting the generator, so the non-convergence seems to result from different reasons. Have you ensured complete comparability between the three cases (in terms of initialization, randomness, ...)? Also it would be helpful to show the surrounding code. – a_guest Feb 20 '19 at 22:32
  • @a_guest the ONLY difference is using a list or a generator. Since the question wasn't about the algorithm, I thought I would spare the details from you. The behavior is different in every case, and the random seed is always the same. case 1 and 3 converge on some (wrong) value, case 2 doesn't converge at all. For more specific details, see https://stackoverflow.com/questions/54734556/pytorch-how-to-create-an-update-rule-that-doesnt-come-from-derivatives – Gulzar Feb 20 '19 at 23:02
  • @Gulzar I see. However as the question stands the problem is not reproducible. As I mentioned the generator function does not have any side-effects so I suppose the problem lies somewhere else in the code. If you could provide a [complete and verifiable example](https://stackoverflow.com/help/mcve) this would increase the chances of the question being answered. – a_guest Feb 21 '19 at 20:36
  • Thanks, I will get to it asap :) – Gulzar Feb 21 '19 at 21:28

0 Answers0