5

I have a probability density function (pdf) f(x,y). And to get its cumulative distribution function (cdf)F(x,y) at point (x,y), you need to integrate the f(x,y), like this: enter image description here

In Scipy, I can do it by integrate.nquad:

x, y=5, 4
F_at_x_y = integrate.nquad(f, [[-inf, x],[-inf, y]])

Now, I need the whole F(x,y) in the x-y panel, something look like this:

enter image description here

How can I do that?


The main problem, is that for every point ranging from (-30,-30) to (30,30), I need to do a integrate.nquad from scratch to get the F(x,y). This is too slow.

I'm wondering, since the results are successive (For example, you get F(5,6) by the value of F(4,4), and integrate from the regions between these 2 points), if it is possible to speed up the process? So we don't need to do integrate from scratch at every point, and hence make the process quicker.


Possible useful links:

Multivariate Normal CDF in Python using scipy

http://cn.mathworks.com/help/stats/mvncdf.html

And I'm thinking about borrowing something from Fibonacci Sequence

How to write the Fibonacci Sequence in Python

Community
  • 1
  • 1
ZK Zhao
  • 19,885
  • 47
  • 132
  • 206
  • Do you really need multivariate normals (3 links out of 3), or rather some general distribution? And have you tried the straightforward `Ffun=lambda x,y:integrate.nquad(f, [[-inf, x],[-inf, y]]); Fvals=[Ffun(x,y) for x,y in zip(xarr,yarr)]` and doing what you said, integrating from `x[i-1]` to `x[i]` in a double loop, and comparing the two? – Andras Deak -- Слава Україні Jan 31 '16 at 15:18
  • @AndrasDeak, I need it for general distribution. The speed is really slow, because one of my `pdf` is as `kdf` (http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html#sklearn.neighbors.KernelDensity.score_samples), that is the bottelneck. `multivariate normal` 's pdf is much faster than that. – ZK Zhao Feb 01 '16 at 00:45

1 Answers1

0

In the end, this is what I've done:

F is cdf, f is pdf

F(5,5) = F(5,4) + F(4,5) - 2 *F(4,4) + f(5,5)

And loop through the whole surface, you can get the result.

The code would look like this:

def cdf_from_pdf(pdf):
    if not isinstance(pdf[0], np.ndarray):
        original_dim = int(np.sqrt(len(pdf)))
        pdf = pdf.reshape(original_dim,original_dim)
    cdf = np.copy(pdf)
    xdim, ydim = cdf.shape
    for i in xrange(1,xdim):
         cdf[i,0] =  cdf[i-1,0] +  cdf[i,0]
    for i in xrange(1,ydim):
         cdf[0,i] =  cdf[0,i-1] +  cdf[0,i]
    for j in xrange(1,ydim):
        for i in xrange(1,xdim):
             cdf[i,j] =  cdf[i-1,j] +  cdf[i,j-1] -  cdf[i-1,j-1] + pdf[i,j]
    return cdf

This is a very rough approximate, and you can perfect the result by changing the +/- equantion into integration.

As for the original value and the margin, cdf[0,:] and cdf[:,0], you can also use integration as well. In my case, it's very small, so I just use the pdf value.

You can test the function by plotting the cdf, or check the value at the cdf[n,n]

ZK Zhao
  • 19,885
  • 47
  • 132
  • 206