17

In order to calculate the CDF of a multivariate normal, I followed this example (for the univariate case) but cannot interpret the output produced by scipy:

from scipy.stats import norm
import numpy as np
mean = np.array([1,5])
covariance = np.matrix([[1, 0.3 ],[0.3, 1]])
distribution = norm(loc=mean,scale = covariance)
print distribution.cdf(np.array([2,4]))

The output produced is:

[[  8.41344746e-01   4.29060333e-04]
 [  9.99570940e-01   1.58655254e-01]]

If the joint CDF is defined as:

P (X1 ≤ x1, . . . ,Xn ≤ xn)

then the expected output should be a real number between 0 and 1.

Community
  • 1
  • 1
statBeginner
  • 829
  • 2
  • 9
  • 23
  • I don't think you can use the `scipy.stats.norm` for the multivariate case. – cel May 31 '15 at 17:43
  • 2
    `scipy.stats` has `multivariate_normal` (http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.multivariate_normal.html), but it does not have a `cdf` method. – Warren Weckesser May 31 '15 at 19:19

3 Answers3

21

After searching a lot, I think this blog entry by Noah H. Silbert describes the only readymade code from a standard library that can be used for computing the cdf for a multivariate normal in Python. Scipy has a way to do it but as mentioned in the blog, it is difficult to find. The approach is based on a paper by Alan Genz’s.

From the blog, this is how it works.

from scipy.stats import mvn
import numpy as np
low = np.array([-10, -10])
upp = np.array([.1, -.2])
mu = np.array([-.3, .17])
S = np.array([[1.2,.35],[.35,2.1]])
p,i = mvn.mvnun(low,upp,mu,S)
print p

0.2881578675080012
statBeginner
  • 829
  • 2
  • 9
  • 23
  • Is it possible to pass an array of points to `mvn.mvnun`? I read the code, seems I can only `loop` through it? – ZK Zhao Feb 09 '16 at 01:47
  • @cqcn1991 I am looking for multivariate cdf to plot by passing an array through a file. Were you able to find the solutions? Can you please have a look here http://stackoverflow.com/questions/37057938/bivariate-cdf-ccdf-distribution-python – Sitz Blogz May 06 '16 at 13:10
  • 1
    The problem with ```mvn.mvnun``` is that it is not deterministic. At least, this code gives diferent results each time: https://pastebin.com/L0WSTRui – David Dale Jan 23 '18 at 11:35
  • 2
    [Here's that blog post](https://www.statisticalmodelcitizen.com/2018/11/19/multivariate-normal-cdf-values-in-python/), reposted (the database in my old blog was corrupted a while ago, and I've only fairly recently been able to recover the posts). It's true that the algorithm that Genz developed is not deterministic, but the probabilities that that code produce only differ in the 9th decimal place. For me at least, the benefits of a fast, accurate algorithm for calculating multivariate normal integrals far outweigh the costs of it not being deterministic. – Noah Motion Nov 19 '18 at 16:02
  • The blogpost is on internet archive [here](https://web.archive.org/web/20170616063023/http://www.nhsilbert.net/source/2014/04/multivariate-normal-cdf-values-in-python/) – Nix G-D Mar 30 '22 at 23:39
  • If there are more than two dimension then randomized Korobov rules is used which is random as the name indicates https://github.com/scipy/scipy/blob/8a64c938ddf1ae4c02a08d2c5e38daeb8d061d38/scipy/stats/mvndst.f#L549-L552 – Benjamin Christoffersen Jan 19 '23 at 10:38
  • Note that the 2D case is treated as a special case. It is deterministic and it is likely close to exact if this is the same implementation as used by R (I guess it is): https://github.com/scipy/scipy/blob/8a64c938ddf1ae4c02a08d2c5e38daeb8d061d38/scipy/stats/mvndst.f#L988-L1031 – Benjamin Christoffersen Jan 19 '23 at 10:38
14

The scipy multivariate_normal from v1.1.0 has a cdf function built in now:

from scipy.stats import multivariate_normal as mvn
import numpy as np

mean = np.array([1,5])
covariance = np.array([[1, 0.3],[0.3, 1]])
dist = mvn(mean=mean, cov=covariance)
print("CDF:", dist.cdf(np.array([2,4])))

CDF: 0.14833820905742245

Documentation can be found here.

adamconkey
  • 4,104
  • 5
  • 32
  • 61
0

If you don't care about performance (i.e. perform it only occasionally), then you can create the multivariate normal pdf using multivariate_normal, and then calculate the cdf by integrate.nquad

ZK Zhao
  • 19,885
  • 47
  • 132
  • 206
  • Can you please elaborate on how we can use this? and can this be used to find the expectation of a function which is dependent on a multivariate normal distribution? – vicky113 Aug 18 '17 at 05:26