After reading How Not to Sort by Average Rating, I was curious if anyone has a Python implementation of a Lower bound of Wilson score confidence interval for a Bernoulli parameter?
-
1for more precision if n*p-cap*(1-p-cap) is below a certain threshold, say 30-35 I'd use a t-distribution with df: (pos+neg)-2 instead of a normal distr. anyhow. just my two cents though – luke14free Apr 05 '12 at 14:12
7 Answers
Reddit uses the Wilson score interval for comment ranking, an explanation and python implementation can be found here
#Rewritten code from /r2/r2/lib/db/_sorts.pyx
from math import sqrt
def confidence(ups, downs):
n = ups + downs
if n == 0:
return 0
z = 1.0 #1.44 = 85%, 1.96 = 95%
phat = float(ups) / n
return ((phat + z*z/(2*n) - z * sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n))
-
4If you're just going to post a link, do it in a comment. If you're posting it as an answer, provide more info from the content and / or pull out the code so not everyone needs to follow the link, and the answer has value even if the link goes dead. – agf Apr 05 '12 at 13:41
-
-
2
-
1@Vladtn I just updated it with Gullevek's answer. Let me know if there is anything else wrong with it. – Amelio Vazquez-Reina Jul 10 '13 at 14:36
-
2I'd just like to add that for a 95% confidence interval the z score should be 1.96, not 1.6. – Wesley Aug 09 '13 at 23:34
-
1@Wesley yes, and I believe `1.0 = 85%` was also wrong, have updated the answer... there is a table of values here http://www.dummies.com/how-to/content/checking-out-statistical-confidence-interval-criti.html – Anentropic Jan 23 '14 at 07:19
-
-
-
-
I'm confused about `z = 1.0`. If compared against the `st.norm.ppf`-based solution below, it looks like 1.0 is equivalent to a confidence interval of 68.269% which is a bit of a strange default. – rjh Sep 26 '21 at 20:00
-
This does not give the result given by NIST in their recipe, which involves solving for the root of an equation. https://www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm – WCN Dec 01 '21 at 21:41
I think this one has a wrong wilson call, because if you have 1 up 0 down you get NaN because you can't do a sqrt
on the negative value.
The correct one can be found when looking at the ruby example from the article How not to sort by average page:
return ((phat + z*z/(2*n) - z * sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n))
To get the Wilson CI without continuity correction, you can use proportion_confint
in statsmodels.stats.proportion
. To get the Wilson CI with continuity correction, you can use the code below.
# cf.
# [1] R. G. Newcombe. Two-sided confidence intervals for the single proportion, 1998
# [2] R. G. Newcombe. Interval Estimation for the difference between independent proportions: comparison of eleven methods, 1998
import numpy as np
from statsmodels.stats.proportion import proportion_confint
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
def propci_wilson_cc(count, nobs, alpha=0.05):
# get confidence limits for proportion
# using wilson score method w/ cont correction
# i.e. Method 4 in Newcombe [1];
# verified via Table 1
from scipy import stats
n = nobs
p = count/n
q = 1.-p
z = stats.norm.isf(alpha / 2.)
z2 = z**2
denom = 2*(n+z2)
num = 2.*n*p+z2-1.-z*np.sqrt(z2-2-1./n+4*p*(n*q+1))
ci_l = num/denom
num = 2.*n*p+z2+1.+z*np.sqrt(z2+2-1./n+4*p*(n*q-1))
ci_u = num/denom
if p == 0:
ci_l = 0.
elif p == 1:
ci_u = 1.
return ci_l, ci_u
def dpropci_wilson_nocc(a,m,b,n,alpha=0.05):
# get confidence limits for difference in proportions
# a/m - b/n
# using wilson score method WITHOUT cont correction
# i.e. Method 10 in Newcombe [2]
# verified via Table II
theta = a/m - b/n
l1, u1 = proportion_confint(count=a, nobs=m, alpha=0.05, method='wilson')
l2, u2 = proportion_confint(count=b, nobs=n, alpha=0.05, method='wilson')
ci_u = theta + np.sqrt((a/m-u1)**2+(b/n-l2)**2)
ci_l = theta - np.sqrt((a/m-l1)**2+(b/n-u2)**2)
return ci_l, ci_u
def dpropci_wilson_cc(a,m,b,n,alpha=0.05):
# get confidence limits for difference in proportions
# a/m - b/n
# using wilson score method w/ cont correction
# i.e. Method 11 in Newcombe [2]
# verified via Table II
theta = a/m - b/n
l1, u1 = propci_wilson_cc(count=a, nobs=m, alpha=alpha)
l2, u2 = propci_wilson_cc(count=b, nobs=n, alpha=alpha)
ci_u = theta + np.sqrt((a/m-u1)**2+(b/n-l2)**2)
ci_l = theta - np.sqrt((a/m-l1)**2+(b/n-u2)**2)
return ci_l, ci_u
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# single proportion testing
# these come from Newcombe [1] (Table 1)
a_vec = np.array([81, 15, 0, 1])
m_vec = np.array([263, 148, 20, 29])
for (a,m) in zip(a_vec,m_vec):
l1, u1 = proportion_confint(count=a, nobs=m, alpha=0.05, method='wilson')
l2, u2 = propci_wilson_cc(count=a, nobs=m, alpha=0.05)
print(a,m,l1,u1,' ',l2,u2)
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# difference in proportions testing
# these come from Newcombe [2] (Table II)
a_vec = np.array([56,9,6,5,0,0,10,10],dtype=float)
m_vec = np.array([70,10,7,56,10,10,10,10],dtype=float)
b_vec = np.array([48,3,2,0,0,0,0,0],dtype=float)
n_vec = np.array([80,10,7,29,20,10,20,10],dtype=float)
print('\nWilson without CC')
for (a,m,b,n) in zip(a_vec,m_vec,b_vec,n_vec):
l, u = dpropci_wilson_nocc(a,m,b,n,alpha=0.05)
print('{:2.0f}/{:2.0f}-{:2.0f}/{:2.0f} ; {:6.4f} ; {:8.4f}, {:8.4f}'.format(a,m,b,n,a/m-b/n,l,u))
print('\nWilson with CC')
for (a,m,b,n) in zip(a_vec,m_vec,b_vec,n_vec):
l, u = dpropci_wilson_cc(a,m,b,n,alpha=0.05)
print('{:2.0f}/{:2.0f}-{:2.0f}/{:2.0f} ; {:6.4f} ; {:8.4f}, {:8.4f}'.format(a,m,b,n,a/m-b/n,l,u))
HTH

- 103
- 2
- 5
The accepted solution seems to use a hard-coded z-value (best for performance).
In the event that you wanted a direct python equivalent of the ruby formula from the blogpost with a dynamic z-value (based on the confidence interval):
import math
import scipy.stats as st
def ci_lower_bound(pos, n, confidence):
if n == 0:
return 0
z = st.norm.ppf(1 - (1 - confidence) / 2)
phat = 1.0 * pos / n
return (phat + z * z / (2 * n) - z * math.sqrt((phat * (1 - phat) + z * z / (4 * n)) / n)) / (1 + z * z / n)

- 523
- 5
- 15
If you'd like to actually calculate z directly from a confidence bound and want to avoid installing numpy/scipy, you can use the following snippet of code,
import math
def binconf(p, n, c=0.95):
'''
Calculate binomial confidence interval based on the number of positive and
negative events observed. Uses Wilson score and approximations to inverse
of normal cumulative density function.
Parameters
----------
p: int
number of positive events observed
n: int
number of negative events observed
c : optional, [0,1]
confidence percentage. e.g. 0.95 means 95% confident the probability of
success lies between the 2 returned values
Returns
-------
theta_low : float
lower bound on confidence interval
theta_high : float
upper bound on confidence interval
'''
p, n = float(p), float(n)
N = p + n
if N == 0.0: return (0.0, 1.0)
p = p / N
z = normcdfi(1 - 0.5 * (1-c))
a1 = 1.0 / (1.0 + z * z / N)
a2 = p + z * z / (2 * N)
a3 = z * math.sqrt(p * (1-p) / N + z * z / (4 * N * N))
return (a1 * (a2 - a3), a1 * (a2 + a3))
def erfi(x):
"""Approximation to inverse error function"""
a = 0.147 # MAGIC!!!
a1 = math.log(1 - x * x)
a2 = (
2.0 / (math.pi * a)
+ a1 / 2.0
)
return (
sign(x) *
math.sqrt( math.sqrt(a2 * a2 - a1 / a) - a2 )
)
def sign(x):
if x < 0: return -1
if x == 0: return 0
if x > 0: return 1
def normcdfi(p, mu=0.0, sigma2=1.0):
"""Inverse CDF of normal distribution"""
if mu == 0.0 and sigma2 == 1.0:
return math.sqrt(2) * erfi(2 * p - 1)
else:
return mu + math.sqrt(sigma2) * normcdfi(p)

- 14,679
- 16
- 53
- 68
-
print(binconf(50, 100)) => (0.26291792852889806, 0.41206457669597374) … 50 positive events, 100 total events gives a range where the upper bound is below 0.5? – Thomas David Baker Dec 22 '21 at 10:56
Here is a simplified (no need for numpy) and slightly improved (0 and n values for k do not cause a math domain error) version of the Wilson score confidence interval with continuity correction, from the original sourcecode written by batesbatesbates in another answer, and also a pure python no-numpy non-continuity correction version, with 2 equivalent ways to calculate (can be switched with eqmode argument, but both ways give the exact same non-continuity correction results):
import math
def propci_wilson_nocc(k, n, z=1.96, eqmode=0):
# Calculates the Binomial Proportion Confidence Interval using the Wilson Score method without continuation correction
# Equations eqmode == 1 from: https://en.wikipedia.org/w/index.php?title=Binomial_proportion_confidence_interval&oldid=1101942017#Wilson_score_interval
# Equations eqmode == 0 from: https://www.evanmiller.org/how-not-to-sort-by-average-rating.html
# The results should be close to:
# from statsmodels.stats.proportion import proportion_confint
# proportion_confint(k, n, alpha=0.05, method='wilson')
#z=1.44 = 85%, 1.96 = 95%
if n == 0:
return 0
p_hat = float(k) / n
z2 = z**2
if eqmode == 0:
ci_l = (p_hat + z2/(2*n) - z*math.sqrt(max(0.0, (p_hat*(1 - p_hat) + z2/(4*n))/n))) / (1 + z2 / n)
else:
ci_l = (1.0 / (1.0 + z2/n)) * (p_hat + z2/(2*n)) - (z / (1 + z2/n)) * math.sqrt(max(0.0, (p_hat*(1 - p_hat)/n + z2/(4*(n**2)))))
if eqmode == 0:
ci_u = (p_hat + z2/(2*n) + z*math.sqrt(max(0.0, (p_hat*(1 - p_hat) + z2/(4*n))/n))) / (1 + z2 / n)
else:
ci_u = (1.0 / (1.0 + z2/n)) * (p_hat + z2/(2*n)) + (z / (1 + z2/n)) * math.sqrt(max(0.0, (p_hat*(1 - p_hat)/n + z2/(4*(n**2)))))
return [ci_l, ci_u]
def propci_wilson_cc(n, k, z=1.96):
# Calculates the Binomial Proportion Confidence Interval using the Wilson Score method with continuation correction
# i.e. Method 4 in Newcombe [1]: R. G. Newcombe. Two-sided confidence intervals for the single proportion, 1998;
# verified via Table 1
# originally written by batesbatesbates https://stackoverflow.com/questions/10029588/python-implementation-of-the-wilson-score-interval/74021634#74021634
p_hat = k/n
q = 1.0-p
z2 = z**2
denom = 2*(n+z2)
num = 2.0*n*p_hat + z2 - 1.0 - z*math.sqrt(max(0.0, z2 - 2 - 1.0/n + 4*p_hat*(n*q + 1)))
ci_l = num/denom
num2 = 2.0*n*p_hat + z2 + 1.0 + z*math.sqrt(max(0.0, z2 + 2 - 1.0/n + 4*p_hat*(n*q - 1)))
ci_u = num2/denom
if p_hat == 0:
ci_l = 0.0
elif p_hat == 1:
ci_u = 1.0
return [ci_l, ci_u]
Note that the returned value will always be bounded between [0.0, 1.0]
(due to how p_hat
is a ratio of k/n
), this is why it's a score and not really a confidence interval, but it's easy to project back to a confidence interval by multiplying ci_l * n
and ci_u * n
, these values will be in the same domain as k and can be plotted alongside.

- 15,832
- 10
- 83
- 102
Here is a much more readable version for how to compute the Wilson Score interval without continuity correction, by Bartosz Mikulski:
from math import sqrt
def wilson(p, n, z = 1.96):
denominator = 1 + z**2/n
centre_adjusted_probability = p + z*z / (2*n)
adjusted_standard_deviation = sqrt((p*(1 - p) + z*z / (4*n)) / n)
lower_bound = (centre_adjusted_probability - z*adjusted_standard_deviation) / denominator
upper_bound = (centre_adjusted_probability + z*adjusted_standard_deviation) / denominator
return (lower_bound, upper_bound)

- 15,832
- 10
- 83
- 102