7

I'm currently trying to solve Pendulum-v0 from the openAi gym environment which has a continuous action space. As a result, I need to use a Normal Distribution to sample my actions. What I don't understand is the dimension of the log_prob when using it :

import torch
from torch.distributions import Normal 

means = torch.tensor([[0.0538],
        [0.0651]])
stds = torch.tensor([[0.7865],
        [0.7792]])

dist = Normal(means, stds)
a = torch.tensor([1.2,3.4])
d = dist.log_prob(a)
print(d.size())

I was expecting a tensor of size 2 (one log_prob for each actions) but it output a tensor of size(2,2).

However, when using a Categorical distribution for discrete environment the log_prob has the expected size:

logits = torch.tensor([[-0.0657, -0.0949],
        [-0.0586, -0.1007]])

dist = Categorical(logits = logits)
a = torch.tensor([1, 1])
print(dist.log_prob(a).size())

give me a tensor a size(2).

Why is the log_prob for Normal distribution of a different size ?

  • I suggest you provide a [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example), i.e. a simple program that can be executed so that we can verify the behavior you're describing, rather than a screenshot of the program! – nbro Mar 19 '20 at 21:15
  • 1
    I edited my question with the code – Samuel Beaussant Mar 19 '20 at 21:28
  • 1
    The PyTorch documentation is very poor, so I completely understand you. Anyway, this documentation page [https://pytorch.org/docs/stable/distributions.html](https://pytorch.org/docs/stable/distributions.html) says that the PyTorch distribution module follows the same design as TensorFlow Probability. If that's really the case, then you may try having a look at the documentation of TFP. I am currently using TFP and I may be able to answer this question, but later. Ping me later, if you don't receive an answer meanwhile. – nbro Mar 19 '20 at 21:45
  • Okay thank you for your time – Samuel Beaussant Mar 19 '20 at 21:53

1 Answers1

7

If one takes a look in the source code of torch.distributions.Normal and finds the definition of the log_prob(value) function, one can see that the main part of the calculation is:

return -((value - self.loc) ** 2) / (2 * var) - some other part

where value is a variable containing values for which you want to calculate the log probability (in your case, a), self.loc is the mean of the distribution (in you case, means) and var is the variance, that is, the square of the standard deviation (in your case, stds**2). One can see that this is indeed the logarithm of the probability density function of the normal distribution, minus some constants and logarithm of the standard deviation that I don't write above.

In the first example, you define means and stds to be column vectors, while the values to be a row vector

means = torch.tensor([[0.0538],
    [0.0651]])
stds = torch.tensor([[0.7865],
    [0.7792]])
a = torch.tensor([1.2,3.4])

But subtracting a row vector from a column vector, that the code does in value - self.loc in Python gives a matrix (try!), thus the result you obtain is a value of log_prob for each of your two defined distribution and for each of the variables in a.

If you want to obtain a log_prob without the cross terms, then define the variables consistently, i.e., either

means = torch.tensor([[0.0538],
    [0.0651]])
stds = torch.tensor([[0.7865],
    [0.7792]])
a = torch.tensor([[1.2],[3.4]])

or

means = torch.tensor([0.0538,
    0.0651])
stds = torch.tensor([0.7865,
    0.7792])
a = torch.tensor([1.2,3.4])

This is how you do in your second example, which is why you obtain the result you expected.

AndrisP
  • 71
  • 1
  • 3