12

This is a very basic question, but I can't seem to find a good answer. What exactly does scipy calculate for

scipy.stats.norm(50,10).pdf(45)

I understand that the probability of a particular value like 45 in a gaussian with mean 50 and std dev 10 is 0. So what exactly is pdf calculating? Is it the area under the gaussian curve, and if so, what is the range of values on the x axis?

max_max_mir
  • 1,494
  • 3
  • 20
  • 36
  • 3
    Interesting question, but not at all a programming question. You may want to start with this wikipedia article: https://en.wikipedia.org/wiki/Probability_density_function – cel Apr 25 '17 at 06:41
  • The question "What is a probability density function?" should be asked over at https://stats.stackexchange.com/ – Warren Weckesser Apr 25 '17 at 20:34
  • 1
    In fact, it has already been asked: https://stats.stackexchange.com/questions/86094/what-is-a-density-function – Warren Weckesser Apr 25 '17 at 20:38
  • You said that "probability of a particular value like 45 in a gaussian with mean 50 and std dev 10 is 0". But to me it is not 0, but a float value AND running the line of code you get: `0.03520653267642995` – Dave Sep 08 '21 at 13:19

2 Answers2

16

The probability density function of the normal distribution expressed in Python is

from math import pi
from math import exp
from scipy import stats


def normal_pdf(x, mu, sigma):
    return 1.0 / (sigma * (2.0 * pi)**(1/2)) * exp(-1.0 * (x - mu)**2 / (2.0 * (sigma**2)))

(compare that to the wikipedia definition). And this is exactly what scipy.stats.norm().pdf() computes: the value of the pdf at point x for a given mu, sigma.

Note that this is not a probability (= area under the pdf) but rather the value of the pdf at the point x you pass to pdf(x) (and that value can very well be greater than 1.0!). You can see that, for example, for N(0, 0.1) at x = 0:

val = stats.norm(0, 0.1).pdf(0)

print(val)

val = normal_pdf(0, 0, 0.1)

print(val)

which gives the output

3.98942280401

3.989422804014327

Not at all a probability = area under the curve!

Note that this doesn't contradict the statement that the probability of particular value like x = 0 is 0 because, formally, the area under the pdf for a point (i.e., an interval of length 0) is zero (if f is a continuous function on [a, b] and F is its antiderivative on [a, b], then the definite integral of f over [a, b] = F(a) - F(b). Here, a = b = x hence the value of the integral is F(x) - F(x) = 0).

Community
  • 1
  • 1
Stefan Zobel
  • 3,182
  • 7
  • 28
  • 38
1

what you are getting is pdf at value x for a normal pdf function with mean 50 and standard deviation 10. check the function here)

easy to visualize using

npdf=norm(50,10)
plt.plot(range(0,100), npdf.pdf(range(0,100)), 'k-', lw=2)`

you could also generate random variables from the normal pdf you created using

npdf.rvs(1000) #1000 numbers 
hist=plt.hist(n.rvs(10000),bins=100,normed=True)

theoretical pdf and normalized histogram from random variables

suvy
  • 693
  • 6
  • 18
  • 1
    Thanks - but I am still not clear on how exactly the pdf is calculated at value x. Is there a range x - delta, x + delta used to calculate the area under the normal distribution? If so, what is the delta used? – max_max_mir Apr 25 '17 at 06:10