Scipy: speed up integration when doing it for the whole surface?

Question

I have a probability density function (pdf) f(x,y). And to get its cumulative distribution function (cdf)F(x,y) at point (x,y), you need to integrate the f(x,y), like this:

In Scipy, I can do it by integrate.nquad:

x, y=5, 4
F_at_x_y = integrate.nquad(f, [[-inf, x],[-inf, y]])

Now, I need the whole F(x,y) in the x-y panel, something look like this:

How can I do that?

The main problem, is that for every point ranging from (-30,-30) to (30,30), I need to do a integrate.nquad from scratch to get the F(x,y). This is too slow.

I'm wondering, since the results are successive (For example, you get F(5,6) by the value of F(4,4), and integrate from the regions between these 2 points), if it is possible to speed up the process? So we don't need to do integrate from scratch at every point, and hence make the process quicker.

Possible useful links:

Multivariate Normal CDF in Python using scipy

http://cn.mathworks.com/help/stats/mvncdf.html

And I'm thinking about borrowing something from Fibonacci Sequence

How to write the Fibonacci Sequence in Python

Do you really need multivariate normals (3 links out of 3), or rather some general distribution? And have you tried the straightforward `Ffun=lambda x,y:integrate.nquad(f, [[-inf, x],[-inf, y]]); Fvals=[Ffun(x,y) for x,y in zip(xarr,yarr)]` and doing what you said, integrating from `x[i-1]` to `x[i]` in a double loop, and comparing the two? — Andras Deak -- Слава Україні, Jan 31 '16 at 15:18
@AndrasDeak, I need it for general distribution. The speed is really slow, because one of my `pdf` is as `kdf` (http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html#sklearn.neighbors.KernelDensity.score_samples), that is the bottelneck. `multivariate normal` 's pdf is much faster than that. — ZK Zhao, Feb 01 '16 at 00:45

score 0 · Answer 1 · answered Feb 19 '16 at 06:49

In the end, this is what I've done:

F is cdf, f is pdf

F(5,5) = F(5,4) + F(4,5) - 2 *F(4,4) + f(5,5)

And loop through the whole surface, you can get the result.

The code would look like this:

def cdf_from_pdf(pdf):
    if not isinstance(pdf[0], np.ndarray):
        original_dim = int(np.sqrt(len(pdf)))
        pdf = pdf.reshape(original_dim,original_dim)
    cdf = np.copy(pdf)
    xdim, ydim = cdf.shape
    for i in xrange(1,xdim):
         cdf[i,0] =  cdf[i-1,0] +  cdf[i,0]
    for i in xrange(1,ydim):
         cdf[0,i] =  cdf[0,i-1] +  cdf[0,i]
    for j in xrange(1,ydim):
        for i in xrange(1,xdim):
             cdf[i,j] =  cdf[i-1,j] +  cdf[i,j-1] -  cdf[i-1,j-1] + pdf[i,j]
    return cdf

This is a very rough approximate, and you can perfect the result by changing the +/- equantion into integration.

As for the original value and the margin, cdf[0,:] and cdf[:,0], you can also use integration as well. In my case, it's very small, so I just use the pdf value.

You can test the function by plotting the cdf, or check the value at the cdf[n,n]

Scipy: speed up integration when doing it for the whole surface?

1 Answers1

Linked