I calculated a 95% confidence interval using scipy and the result is different to what I was expecting.
I am solving a problem where someone rolled a die 20K times and observed 3,932 sixes. I am being asked to build a 95% confidence interval for the probability of rolling a six. The number of sixes follows a binomial distribution with 20K repetitions and a probability of success of 3,932 / 20K.
# Number of observations
n_obs = 20000
# Observed proportion of successes
p_obs = 3932 / n_obs
# Observed standard deviation
s_obs = numpy.sqrt((p_obs * (1 - p_obs)) / n_obs)
If I use a normal approximation with these parameters, the confidence interval should be p_obs
± 1.96 * s_obs
. That is, between 0.1911 and 0.2021.
However, if I do the following, it returns a completely different interval.
# Declare normal random variable
X = scipy.stats.norm(loc=p_obs, scale=s_obs)
# Get interval
X.interval(alpha=0.05)
> (0.1964, 0.1968) # Different to what I was expecting
Why is this happening? Am I missing something?