28

In some (e.g. machine learning) libraries, we can find log_prob function. What does it do and how is it different from taking just regular log?

For example, what is the purpose of this code:

dist = Normal(mean, std)
sample = dist.sample()
logprob = dist.log_prob(sample)

And subsequently, why would we first take a log and then exponentiate the resulting value instead of just evaluating it directly:

prob = torch.exp(dist.log_prob(sample))
cerebrou
  • 5,353
  • 15
  • 48
  • 80
  • You ever find an answer? I was kind of hoping there was a direct way to compute PDFs in torch. This is close but annoying you have to exp it. – JustinBlaber Jul 28 '20 at 00:49

5 Answers5

14

As your own answer mentions, log_prob returns the logarithm of the density or probability. Here I will address the remaining points in your question:

  • How is that different from log? Distributions do not have a method log. If they did, the closest possible interpretation would indeed be something like log_prob but it would not be a very precise name since if begs the question "log of what"? A distribution has multiple numeric properties (for example its mean, variance, etc) and the probability or density is just one of them, so the name would be ambiguous.

The same does not apply to the Tensor.log() method (which may be what you had in mind) because Tensor is itself a mathematical quantity we can take the log of.

  • Why take the log of a probability only to exponentiate it later? You may not need to exponentiate it later. For example, if you have the logs of probabilities p and q, then you can directly compute log(p * q) as log(p) + log(q), avoiding intermediate exponentiations. This is more numerically stable (avoiding underflow) because probabilities may become very close to zero while their logs do not. Addition is also more efficient than multiplication in general, and its derivative is simpler. There is a good article about those topics at https://en.wikipedia.org/wiki/Log_probability.
user118967
  • 4,895
  • 5
  • 33
  • 54
  • Is the data structure returned by `dist.log_prob()` just a number? Or does it contain the gradient function? In some code, I found that it returns a tensor with the 2nd component being the gradient, am I mistaken? – Yan King Yin Jul 28 '22 at 13:21
  • @YanKingYin, that would depend on the specific system you are using. In PyTorch distributions and most other systems I know, it's just a number (or tensor if you're using batches). – user118967 Jul 29 '22 at 16:19
10

log_prob takes the log of the probability (of some actions). Example:

action_logits = torch.rand(5)
action_probs = F.softmax(action_logits, dim=-1)
action_probs

Returns:

tensor([0.1457, 0.2831, 0.1569, 0.2221, 0.1922])

Then:

dist = Categorical(action_probs)
action = dist.sample()
print(dist.log_prob(action), torch.log(action_probs[action]))

Returns:

tensor(-1.8519) tensor(-1.8519)

DSH
  • 1,038
  • 16
  • 27
7

Part of the answer is that log_prob returns the log of the probability density/mass function evaluated at the given sample value.

cerebrou
  • 5,353
  • 15
  • 48
  • 80
6

Answer

logprob = dist.log_prob(sample) means to get the logarithmic probability (logprob) of one experiment sample (sample) under a specific distribution (dist).

(It's awkward to understand, takes a while to grasp the below explaining.)

Explaining

(We use an easy example to understand what does log_prob do?)

Forward test

Firstly, generate a probability a by using a uniform distribution bouned in [0, 1],

import torch.distributions as D
import torch

a = torch.empty(1).uniform_(0, 1)
a
# OUTPUT: tensor([0.3291])

Based on this probability and D.Bernoulli, we can instantiate a Bernoulli distribution b=D.Bernoulli(a) (which means, the result of every Bernoulli experiment, b.sample(), is either 1 with probability a=0.3291 or 0 with probability 1-a=0.6709),

b = D.Bernoulli(a)
b
# OUTPUT: Bernoulli()

We can verify this with one Bernoulli experiment to get a sample c (hold that c have 0.3291 probability to be 1, while 0.6709 probability to be 0),

c = b.sample()
c
# OUTPUT: tensor([0.])

With the Bernoulli distribution b and the sample c, we can get the logarithmic probability of c (Bernoulli experiment sample) under the distribution b (a specific Bernoulli distribution with 0.3291 to be TRUE) as, (or officially, the log of the probability density/mass function evaluated at value (c))

b.log_prob(c)
b 
# OUTPUT: tensor([-0.3991])

Backward verify

As we already know, the probability for each sample to be 0 (for one experiment, the probability can be simply viewed as its probability density/mass function) is 0.6709, so we can verify the log_prob result with,

torch.log(torch.tensor(0.6709))
# OUTPUT: tensor(-0.3991)

It equals the logarithmic probability of c under b. (Finished!)

Hope it's useful for you.

Libo Huang
  • 69
  • 1
  • 3
1

Given a probability distribution, the log_prob function computes the log probability of the sampled actions. However, a catch makes it different from the torch.log function. It is equivalent to torch.log(P) for all the probabilities corresponding to actions '1', and equal to torch.log(1-P) otherwise.

For example,

prob = torch.rand(5)
m=Bernoulli(prob)
act = m.sample()

act
tensor([0., 0., 1., 0., 0.])

prob
tensor([0.4880, 0.5403, 0.1671, 0.1158, 0.1695])

m.log_prob(act)
tensor([-0.6694, -0.7773, -1.7889, -0.1230, -0.1857])

torch.log(prob)
tensor([-0.7175, -0.6155, -1.7889, -2.1562, -1.7751])

torch.log(1-prob)
tensor([-0.6694, -0.7773, -0.1829, -0.1230, -0.1857])