136

I am looking for a function in Numpy or Scipy (or any rigorous Python library) that will give me the cumulative normal distribution function in Python.

martineau
  • 119,623
  • 25
  • 170
  • 301

8 Answers8

162

Here's an example:

>>> from scipy.stats import norm
>>> norm.cdf(1.96)
0.9750021048517795
>>> norm.cdf(-1.96)
0.024997895148220435

In other words, approximately 95% of the standard normal interval lies within two standard deviations, centered on a standard mean of zero.

If you need the inverse CDF:

>>> norm.ppf(norm.cdf(1.96))
array(1.9599999999999991)
Alex Reynolds
  • 95,983
  • 54
  • 240
  • 345
  • 13
    Also, you can specify the mean (loc) and variance (scale) as parameters. e.g, d = norm(loc=10.0, scale=2.0); d.cdf(12.0); Details here: http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.norm.html – Irvan Oct 31 '14 at 13:41
  • 9
    @Irvan, the scale parameter is actually the standard deviation, NOT the variance. – qkhhly Jun 02 '15 at 19:08
  • 2
    Why does scipy name these as `loc` and `scale` ? I used the `help(norm.ppf)` but then what the heck are `loc` and `scale` - need a help for the help.. – WestCoastProjects Dec 22 '16 at 20:31
  • 4
    @javadba - location and scale are more general terms in statistics that are used to parameterize a wide range of distributions. For the normal distribution, they line up with mean and sd, but not so for other distributions. – Michael Ohlrogge Aug 25 '17 at 17:59
  • 1
    @MichaelOhlrogge . Thx! Here is a page from NIST explaining further http://www.itl.nist.gov/div898/handbook/eda/section3/eda364.htm – WestCoastProjects Aug 25 '17 at 18:03
  • 1
    This tutorial from scipy explains things very well: https://docs.scipy.org/doc/scipy/reference/tutorial/stats.html "All continuous distributions take loc and scale as keyword parameters to adjust the location and scale of the distribution, e.g. for the standard normal distribution the location is the mean and the scale is the standard deviation." – SummerEla Aug 07 '18 at 19:20
  • 1
    @Irvan 's addition in one command: `norm.cdf(12, loc=10.0, scale=2.0)` – Qaswed Feb 08 '21 at 15:16
56

It may be too late to answer the question but since Google still leads people here, I decide to write my solution here.

That is, since Python 2.7, the math library has integrated the error function math.erf(x)

The erf() function can be used to compute traditional statistical functions such as the cumulative standard normal distribution:

from math import *
def phi(x):
    #'Cumulative distribution function for the standard normal distribution'
    return (1.0 + erf(x / sqrt(2.0))) / 2.0

Ref:

https://docs.python.org/2/library/math.html

https://docs.python.org/3/library/math.html

How are the Error Function and Standard Normal distribution function related?

gibbone
  • 2,300
  • 20
  • 20
WTIFS
  • 980
  • 8
  • 12
  • 3
    This was exactly what I was looking for. If someone else than me wonders how this can be used to calculate "percentage of data that lies within the standard distribution", well: 1 - (1 - phi(1)) * 2 = 0.6827 ("68% of data within 1 standard deviation") – Hannes Landeholm Jul 10 '17 at 18:30
  • 4
    For a general normal distribution, it would be `def phi(x, mu, sigma): return (1 + erf((x - mu) / sigma / sqrt(2))) / 2`. – Bernhard Barker Mar 15 '20 at 19:18
53

Starting Python 3.8, the standard library provides the NormalDist object as part of the statistics module.

It can be used to get the cumulative distribution function (cdf - probability that a random sample X will be less than or equal to x) for a given mean (mu) and standard deviation (sigma):

from statistics import NormalDist

NormalDist(mu=0, sigma=1).cdf(1.96)
# 0.9750021048517796

Which can be simplified for the standard normal distribution (mu = 0 and sigma = 1):

NormalDist().cdf(1.96)
# 0.9750021048517796

NormalDist().cdf(-1.96)
# 0.024997895148220428
Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
  • 8
    Based on some quick checks, this is significantly faster than norm.cdf from scipy.stats and a fair bit faster than both scipy and math implementations of erf. – dcl Mar 15 '21 at 04:33
  • 2
    Does this vectorize? Or should someone use the scipy implementation if they need to compute the CDF evaluated at all points in an array? – hasManyStupidQuestions May 16 '21 at 14:33
  • 1
    Awesome. Maybe you know how to get inverse (normsinv)? Edit: OK, it is inv_cdf(). Thank you! – Juozas Aug 28 '22 at 13:39
19

Adapted from here http://mail.python.org/pipermail/python-list/2000-June/039873.html

from math import *
def erfcc(x):
    """Complementary error function."""
    z = abs(x)
    t = 1. / (1. + 0.5*z)
    r = t * exp(-z*z-1.26551223+t*(1.00002368+t*(.37409196+
        t*(.09678418+t*(-.18628806+t*(.27886807+
        t*(-1.13520398+t*(1.48851587+t*(-.82215223+
        t*.17087277)))))))))
    if (x >= 0.):
        return r
    else:
        return 2. - r

def ncdf(x):
    return 1. - 0.5*erfcc(x/(2**0.5))
Unknown
  • 45,913
  • 27
  • 138
  • 182
  • 5
    Since the std lib implements math.erf(), there is no need for a sep implementation. – Marc Feb 25 '16 at 20:10
  • 1
    i was not able to find an answer, where do those numbers come from ? – TmSmth Jan 15 '20 at 23:31
  • 1
    @TmSmth If I had to guess this looks like some kind of approximation of what is inside the exponential, so you probably can calculate them with some kind of taylor expansion after fiddling with your function a bit (changing vars, then say r = t * exp( - z**2 -f(t)) and do a taylor expansion of f (which can be found numerically – tbrugere Jun 01 '21 at 07:29
18

To build upon Unknown's example, the Python equivalent of the function normdist() implemented in a lot of libraries would be:

def normcdf(x, mu, sigma):
    t = x-mu;
    y = 0.5*erfcc(-t/(sigma*sqrt(2.0)));
    if y>1.0:
        y = 1.0;
    return y

def normpdf(x, mu, sigma):
    u = (x-mu)/abs(sigma)
    y = (1/(sqrt(2*pi)*abs(sigma)))*exp(-u*u/2)
    return y

def normdist(x, mu, sigma, f):
    if f:
        y = normcdf(x,mu,sigma)
    else:
        y = normpdf(x,mu,sigma)
    return y
Cerin
  • 60,957
  • 96
  • 316
  • 522
13

Alex's answer shows you a solution for standard normal distribution (mean = 0, standard deviation = 1). If you have normal distribution with mean and std (which is sqr(var)) and you want to calculate:

from scipy.stats import norm

# cdf(x < val)
print norm.cdf(val, m, s)

# cdf(x > val)
print 1 - norm.cdf(val, m, s)

# cdf(v1 < x < v2)
print norm.cdf(v2, m, s) - norm.cdf(v1, m, s)

Read more about cdf here and scipy implementation of normal distribution with many formulas here.

Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
2

Taken from above:

from scipy.stats import norm
>>> norm.cdf(1.96)
0.9750021048517795
>>> norm.cdf(-1.96)
0.024997895148220435

For a two-tailed test:

Import numpy as np
z = 1.96
p_value = 2 * norm.cdf(-np.abs(z))
0.04999579029644087
David Miller
  • 477
  • 5
  • 4
0

Simple like this:

import math
def my_cdf(x):
    return 0.5*(1+math.erf(x/math.sqrt(2)))

I found the formula in this page https://www.danielsoper.com/statcalc/formulas.aspx?id=55