7

Problem Statement - A random variable X is N(25, 4). Find the indicated percentile for X:

a. The 10th percentile

b. The 90th percentile

c. The 80th percentile

d. The 50th percentile

Attempt 1

My code:

import numpy as np
import math
import scipy.stats
mu=25
sigma=4
a=mu-(1.282*4)
b=mu+(1.282*4)

... like that. I got the values from the Zscore table given in https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_probability/bs704_probability10.html

Attempt 2

X=np.random.normal(25,4,10000) # sample size not mentioned in 
                                 problem. I just assumed it
a_9 = np.percentile(X,10)
b_9 = np.percentile(X,90)
c_9 = np.percentile(X,80)
d_9 = np.percentile(X,50)

But the answers are incorrect as per the hidden test cases of the practice platform. Can anyone please tell me the right way to compute the answers? Is there any scipy.stats function for this?

MVKXXX
  • 193
  • 1
  • 2
  • 11
  • Why does the second attempt is incorrect? Do you have some test cases where it fails? – David Feb 01 '21 at 06:41
  • Yes. My answers are not matching the predefined hidden answers of the test cases. – MVKXXX Feb 01 '21 at 06:58
  • As I mentioned in comments, I had assumed the sample size to be 10000. It was not given in question. May be that is an issue.... I dont know....Is there any alternate way to approach the problem statement? – MVKXXX Feb 01 '21 at 07:02
  • 1
    In attempt 2 you're filling X with random data, so percentiles will differ per execution. Z-scores are no fixed values but calculated `z = (x - mu) / sigma`, so filling x with random data will never deliver the same results. As you have the Z-scores for this dataset you can calculate the percentiles `mu+(z*sigma)` as per your first example. – RJ Adriaansen Feb 01 '21 at 09:44

3 Answers3

12

You can use scipy.stats and built-in ppf function (look documentation)

import numpy as np
import scipy.stats as sps
import matplotlib.pyplot as plt

mu = 25
sigma = 4

# define the normal distribution and PDF
dist = sps.norm(loc=mu, scale=sigma)
x = np.linspace(dist.ppf(.001), dist.ppf(.999))
y = dist.pdf(x)

# calculate PPFs
ppfs = {}
for ppf in [.1, .5, .8, .9]:
    p = dist.ppf(ppf)
    ppfs.update({ppf*100: p})

# plot results
fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(x, y, color='k')
for i, ppf in enumerate(ppfs):
    ax.axvline(ppfs[ppf], color=f'C{i}', label=f'{ppf:.0f}th: {ppfs[ppf]:.1f}')
ax.legend()
plt.show()

that gives enter image description here

Max Pierini
  • 2,027
  • 11
  • 17
6

Use the ppf method from scipy.stats.norm (normal distribution).

scipy.stats.norm.ppf(0.1, loc=25, scale=4)

This function is analogous to the qnorm function in r. The ppf method gives the value of the random variable at the given percentile.

Ananthu
  • 139
  • 1
  • 9
  • This is cool, to get all the percentiles you'd do this: `scipy.stats.norm.ppf([0.1, 0.9, 0.8, 0.5], loc=25, scale=4)` to get `[19.87379374, 30.12620626, 28.36648493, 25.]`. 100th percentile gives `inf`, not from a stats background, not sure why. – Prox Mar 14 '22 at 08:59
-1
a_9 = 19.88
b_9 = 30.12
c_9 = 28.36
d_9 = 25.00

X = np.random.normal(25,4,10000000)
Clemsang
  • 5,053
  • 3
  • 23
  • 41