How to use norm.ppf()?

Question

I couldn't understand how to properly use this function, could someone please explain it to me?

Let's say I have:

a mean of 172.7815
a standard deviation of 4.1532
N = 50 (50 samples)

When I'm asked to calculate the (95%) margin of error using norm.ppf() will the code look like below?

norm.ppf(0.95, loc=172.78, scale=4.15)

or will it look like this?

norm.ppf(0.95, loc=0, scale=1)

Because I know it's calculating the area of the curve to the right of the confidence interval (95%, 97.5% etc...see image below), but when I have a mean and a standard deviation, I get really confused as to how to use the function.

in many cases, as explained in [this answer](https://stackoverflow.com/a/73900913/19123103), inverse survival function `norm.isf()` is more intuitive. — cottontail, Nov 16 '22 at 21:11

jameshollisandrew · Answer 1 · 2020-05-02T10:35:01.223

The method norm.ppf() takes a percentage and returns a standard deviation multiplier for what value that percentage occurs at.

It is equivalent to a, 'One-tail test' on the density plot.

From scipy.stats.norm:

ppf(q, loc=0, scale=1) Percent point function (inverse of cdf — percentiles).

Standard Normal Distribution

The code:

norm.ppf(0.95, loc=0, scale=1)

Returns a 95% significance interval for a one-tail test on a standard normal distribution (i.e. a special case of the normal distribution where the mean is 0 and the standard deviation is 1).

Our Example

To calculate the value for OP-provided example at which our 95% significance interval lies (For a one-tail test) we would use:

norm.ppf(0.95, loc=172.7815, scale=4.1532)

This will return a value (that functions as a 'standard-deviation multiplier') marking where 95% of data points would be contained if our data is a normal distribution.

To get the exact number, we take the norm.ppf() output and multiply it by our standard deviation for the distribution in question.

A Two-Tailed Test

If we need to calculate a 'Two-tail test' (i.e. We're concerned with values both greater and less than our mean) then we need to split the significance (i.e. our alpha value) because we're still using a calculation method for one-tail. The split in half symbolizes the significance level being appropriated to both tails. A 95% significance level has a 5% alpha; splitting the 5% alpha across both tails returns 2.5%. Taking 2.5% from 100% returns 97.5% as an input for the significance level.

Therefore, if we were concerned with values on both sides of our mean, our code would input .975 to represent a 95% significance level across two-tails:

norm.ppf(0.975, loc=172.7815, scale=4.1532)

Margin of Error

Margin of error is a significance level used when estimating a population parameter with a sample statistic. We want to generate our 95% confidence interval using the two-tailed input to norm.ppf() since we're concerned with values both greater and less than our mean:

ppf = norm.ppf(0.975, loc=172.7815, scale=4.1532)

Next, we'd take the ppf and multiply it by our standard deviation to return the interval value:

interval_value = std * ppf

Finally, we'd mark the confidence intervals by adding & subtracting the interval value from the mean:

lower_95 = mean - interval_value
upper_95 = mean + interval_value

Plot with a vertical line:

_ = plt.axvline(lower_95, color='r', linestyle=':')
_ = plt.axvline(upper_95, color='r', linestyle=':')

the mean put in loc and standard deviation in scale, are these sample's mean and std or population's parameters? — kikatuso, Nov 06 '20 at 11:51
@kikatuso The above example receives the sample's values. Sample values are input into the margin of error function to estimate confidence in the sample representing the population parameter. Therefore, sample values are input into the function, and margin of error is output. User uses output to evaluate how well the sample represents the population (i.e. How much 'confidence' the user should have that the sample aligns with the population - so assumptions from the sample can be projected back onto the population, etc.). Hope this helps! Sorry for the delayed response! — jameshollisandrew, Nov 10 '20 at 20:22
The documentation of ppf() states it is the inverse of the cdf. So it should take a fraction of cdf and return data value equivalent to it. It could be simple -- I don't understand why it is defined in terms of the moments? Is there an alternative? I actually need to use for a cauchy, where moments aren't defined. — shaunc, Feb 25 '21 at 19:15

score 15 · Answer 2 · edited Sep 29 '22 at 19:47

15

James' statement that norm.ppf returns a "standard deviation multiplier" is wrong. This feels pertinent as his post is the top google result when one searches for norm.ppf.

'norm.ppf' is the inverse of 'norm.cdf'. In the example, it simply returns the value at the 95% percentile. There is no "standard deviation multiplier" involved.

A better answer exists here: How to calculate the inverse of the normal cumulative distribution function in python?

edited Sep 29 '22 at 19:47

cottontail

10,268
18
50
51

answered Aug 25 '21 at 18:01

sekwjlwf

309
3
8

2

This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). – PM 77-1 Aug 27 '21 at 15:40
6

To re-iterate, the top answer is incorrect. This is important because the thread is still the top result on Google when one searches for "norm.ppf". If you actually try to read and comprehend, my post actually does answer the question, and provide a reference to an even more detailed explanation. As quoted from the link @PM77-1 provided: "Generally, truly important information should be incorporated into an answer anyway" – sekwjlwf Jan 13 '22 at 02:18

score 3 · Answer 3 · edited Sep 29 '22 at 20:15

You can figure out the confidence interval with norm.ppf directly, without calculating margin of error

upper_of_interval = norm.ppf(0.975, loc=172.7815, scale=4.1532/np.sqrt(50))
lower_of_interval = norm.ppf(0.025, loc=172.7815, scale=4.1532/np.sqrt(50))

4.1532 is sample standard deviation, not the standard deviation of the sampling distribution of the sample mean. So, scale in norm.ppf will be specified as scale = 4.1532 / np.sqrt(50), which is the estimator of standard deviation of the sampling distribution.

(The value of standard deviation of the sampling distribution is equal to population standard deviation / np.sqrt(sample size). Here, we did not know the population standard deviation and the sample size is more than 30, so sample standard deviation / np.sqrt(sample size) can be used as a good estimator).

Margin of error can be calculated with (upper_of_interval - lower_of_interval) / 2.

The image explaining 2.5 and 97.5 in norm.ppf()

score 0 · Answer 4 · answered Apr 13 '21 at 20:50

calculate the amount for the 95% percentile and draw a vertical line and an annotation with the amount

mean=172.7815
std=4.1532
N = 50

results=norm.rvs(mean,std, size=N)
pct_5 = norm.ppf(.95,mean,std)
plt.hist(results,bins=10)
plt.axvline(pct_5)
plt.annotate(pct_5,xy=(pct_5,6))
plt.show()

score 0 · Answer 5 · answered Sep 29 '22 at 19:49

As other answers pointed out, norm.ppf(1-alpha) returns the value on the (1-alpha)x100-th percentile of a normal distribution specified by the parameters passed to the it. For example in the OP, it returns the 95th percentile of a normal distribution with mean 172.78 and standard deviation 4.15.

If you're looking for a function that returns the same value (N-th percentile on the normal distribution) as a function of alpha instead, there's the inverse survival function, norm.isf(alpha), which tells you the number at which (1-alpha) is above it.

from scipy.stats import norm
alpha = 0.05
v1 = norm.isf(alpha)
v2 = norm.ppf(1-alpha)
np.isclose(v1, v2)     # True

How to use norm.ppf()?

5 Answers5

Linked