2

I want to find out the confidence interval of samples which follow a normal distribution.

To test the code, I create a sample first and try to plot a picture of confidence interval in Jupyter notebook[python kernel]

%matplotlib notebook

import pandas as pd
import numpy as np
import statsmodels.stats.api as sms
import matplotlib.pyplot as plt

s= np.random.normal(0,1,2000)
# s= range(10,14)                   <---this sample has the right CI
# s = (0,0,1,1,1,1,1,2)             <---this sample has the right CI

# confidence interval
# I think this is the fucniton I misunderstand
ci=sms.DescrStatsW(s).tconfint_mean()

plt.figure()
_ = plt.hist(s,  bins=100)

# cnfidence interval left line
one_x12, one_y12 = [ci[0], ci[0]], [0, 20]
# cnfidence interval right line
two_x12, two_y12 = [ci[1], ci[1]], [0, 20]

plt.plot(one_x12, one_y12, two_x12, two_y12, marker = 'o')

The green and yellow lines suppose to be confidence interval. But they are not at the right position.

I might misunderstand this function :

sms.DescrStatsW(s).tconfint_mean()

But the document says this function will return confidence interval.

enter image description here

This is the figure I expect:

%matplotlib notebook

import pandas as pd
import numpy as np
import statsmodels.stats.api as sms
import matplotlib.pyplot as plt

s= np.random.normal(0,1,2000)


plt.figure()
_ = plt.hist(s,  bins=100)
# cnfidence interval left line
one_x12, one_y12 = [np.std(s, axis=0) * -1.96, np.std(s, axis=0) * -1.96], [0, 20]
# cnfidence interval right line
two_x12, two_y12 = [np.std(s, axis=0) * 1.96, np.std(s, axis=0) * 1.96], [0, 20]

plt.plot(one_x12, one_y12, two_x12, two_y12, marker = 'o')

enter image description here

Matthew May
  • 113
  • 1
  • 1
  • 10

1 Answers1

4

The question looks like "what function is there to calculate the confidence interval".

As the given data is in normal distribution, this can be done simply by

ci = scipy.stats.norm.interval(0.95, loc=0, scale=1)

0.95 is the alpha value, which specifies a 95 percentile point, as the corresponding 1.96 standard deviations of the mean is given in the formula. (https://en.wikipedia.org/wiki/1.96)

the loc=0 specifies the mean value, and scale=1 is for the sigma. (https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule)

You may check out @bogatron 's answer for more details on Compute a confidence interval from sample data


The following code generates the plot you want. I seeded the random number for reproducibility.

import pandas as pd
import numpy as np
import statsmodels.stats.api as sms
import matplotlib.pyplot as plt
import scipy

s = np.random.seed(100)
s= np.random.normal(0,1,2000)

plt.figure()
_ = plt.hist(s,  bins=100)

sigma=1
mean=0
ci = scipy.stats.norm.interval(0.95, loc=mean, scale=sigma)
print(ci)

# cnfidence interval left line
one_x12, one_y12 = [ci[0],ci[0]], [0, 20]
# cnfidence interval right line
two_x12, two_y12 = [ci[1],ci[1]], [0, 20]

plt.plot(one_x12, one_y12, two_x12, two_y12, marker = 'o')

ci returns

(-1.959963984540054, 1.959963984540054)

And here is the plot.

enter image description here

Claire
  • 639
  • 9
  • 25
  • I don't think that this is a `confidence interval` in the mathematical sense. But this interval fulfills that the probability for a standard normal distributed random variable to be sampled inside this interval is 95%. This is probably what the question was about, but one should rather not call this a "[confidence interval](https://en.wikipedia.org/wiki/Confidence_interval)". I think this can me interpreted a special case of "[prediction interval](https://en.wikipedia.org/wiki/Prediction_interval)" without any epistemic uncertianty. – Jakob Dec 22 '21 at 13:00