1

I'm performing a one dimensional integral over an array as described here and here. As stated in those answers, I can not use scipy.integrate.quad to vectorize the integrals over the array because it employs an adaptive algorithm, which is why I use numpy.trapz below (I could also have used scipy.integrate.simps or scipy.integrate.romb)

import numpy as np

# Define some random data
Ndata = 500
data1 = np.random.uniform(.1, .8, Ndata)
data2 = np.random.uniform(.01, .2, Ndata)

# 1D function to integrate
def distFunc(x, c1=1., c2=.1):
    B1 = ((data1 - (1. / x)) / data2)**2
    B2 = ((x - c1) / c2)**2
    f = np.exp(-.5 * B1) * np.exp(-.5 * B2) / c2
    return f

# Values in x to evaluate the integral.
x = np.linspace(.1, 10, 100).reshape(-1, 1)

# Integral in x for each of the Ndata values defined above.
int_exp = np.trapz(distFunc(x), x, axis=0)

This works fine for a single dimension, but now I'd like to perform a double integral, replacing the c2 constant by a variable:

# 2D function to integrate
def distFunc(x, y, c1=1.):
    B1 = ((data1 - (1. / x)) / data2)**2
    B2 = ((x - c1) / y)**2
    f = np.exp(-.5 * B1) * np.exp(-.5 * B2) / y
    return f

For what I could gather, the only available function is scipy.integrate.dblquad bu that means I could no longer apply the integral to an entire array in a single pass, and I would have to use a for loop which is considerably slower.

Is there any solution to this? I'm open to almost anything as long as it is reasonable performance-wise (I'm plugging this double-integral into an MCMC and it needs to be be evaluated millions of times)


ADD

This is my attempt with a 1-dimensional integral using scipy.integrate.quad inside a for loop (ie: one data value in the array at a time). The process is more than 50x slower than using np.trapz over the entire array.

import numpy as np
from scipy.integrate import quad

# Define some random data
Ndata = 500
data1 = np.random.uniform(.1, .8, Ndata)
data2 = np.random.uniform(.01, .2, Ndata)

# Function to integrate
def distFunc2(x, data1_i, data2_i, c1=1., c2=.1):
    B1 = ((data1_i - (1. / x)) / data2_i)**2
    B2 = ((x - c1) / c2)**2
    f = np.exp(-.5 * B1) * np.exp(-.5 * B2) / c2
    return f

s = t.time()
int_exp = np.zeros(Ndata)
for i in range(Ndata):
    int_exp[i] = quad(distFunc2, .1, 10., args=(data1[i], data2[i]))[0]
print(t.time() - s)

ADD 2

Testing the answer given below, it sort of works, with the caveat that at times it can fail super hard compared to dblquad (which is far slower but much more precise). I'm guessing this is related to the algorithm used by np.trapz.

# Define some random data
Ndata = 10
data1 = np.random.uniform(.1, 10., Ndata)
data2 = np.random.uniform(.1, .2, Ndata)

c1 = .1
print(integ_dblquad(c1, data1, data2))
print(integ_trapz(c1, data1, data2))

def integ_dblquad(c1, data1, data2):
    def distFunc(y, x, d1_i, d2_i, c1):
        B1 = ((d1_i - (1. / x)) / d2_i)**2
        B2 = ((x - c1) / y)**2
        return (np.exp(-.5 * B1) / d2_i) * np.exp(-.5 * B2) / y

    int_exp = np.zeros(data1.size)
    for i in range(data1.size):
        int_exp[i] = dblquad(
            distFunc, .1, 10., lambda x: 0, lambda x: 5.,
            args=(data1[i], data2[i], c1))[0]

    return np.sum(np.log(int_exp))

def integ_trapz(c1, data1, data2):
    def distFunc2d(x, y):
        B1 = ((data1 - (1. / x)) / data2)**2
        B2 = ((x - c1) / y)**2
        return (np.exp(-.5 * B1) / data2) * np.exp(-.5 * B2) / y

    # Values in x to evaluate the integral.
    x = np.linspace(.1, 10, 1000)
    y = np.linspace(.1, 5., 1000)
    # Integral in x for each of the Ndata values defined above.
    int_exp2d = np.trapz(np.trapz(distFunc2d(x[:, np.newaxis], y[:, np.newaxis, np.newaxis]), y, axis=0), x, axis=0)

    return np.sum(np.log(int_exp2d))
Gabriel
  • 40,504
  • 73
  • 230
  • 404

1 Answers1

3

If I've understood your problem correctly, you can just call trapz twice:

import numpy as np

# Define some random data
Ndata = 500
data1 = np.random.uniform(.1, .8, Ndata)
data2 = np.random.uniform(.01, .2, Ndata)

# 1D function to integrate
def distFunc(x, c1=1., c2=.1):
    B1 = ((data1 - (1. / x)) / data2)**2
    B2 = ((x - c1) / c2)**2
    f = np.exp(-.5 * B1) * np.exp(-.5 * B2) / c2
    return f

def distFunc2d(x, y, c1=1.):
    B1 = ((data1 - (1. / x)) / data2)**2
    B2 = ((x - c1) / y)**2
    f = np.exp(-.5 * B1) * np.exp(-.5 * B2) / y
    return f

# Values in x to evaluate the integral.
x = np.linspace(.1, 10, 100)
y = np.linspace(.1, 10, 100)

# Integral in x for each of the Ndata values defined above.
int_exp = np.trapz(distFunc(x[:,np.newaxis]), x, axis=0)
int_exp2d = np.trapz(np.trapz(distFunc2d(x[:,np.newaxis],y[:,np.newaxis,np.newaxis]), y, axis=0), x, axis=0)
user545424
  • 15,713
  • 11
  • 56
  • 70
  • I've edited my question commenting this answer. It *sort of* works, except it will fail miserably at times. I'm guessing this is related to `np.trapz` and not to your implementation, so if no better answer comes along, I'll accept it. Thank you! – Gabriel Nov 15 '18 at 23:23
  • 1
    No problem. Yeah, trapz behind the scenes is just doing the trapezoidal integration, so you have to evaluate the function at a fine enough scale that the linear approximation it is making is a good one. It depends on the problem whether the speed up you get by evaluating everything behind the scenes in C makes up for the overhead of the extra function calls required to evaluate it at a fine grained scale. If you really need to speed it up you can of course rewrite it in C. The GSL library has functions for performing CQUAD doubly adaptive integration. – user545424 Nov 16 '18 at 16:24
  • See https://www.gnu.org/software/gsl/manual/html_node/CQUAD-doubly_002dadaptive-integration.html. – user545424 Nov 16 '18 at 16:24
  • Thank you. I actually found that I can solve the integral in `y` as a gamma function, so I'm back to a single integral which runs reasonably well with `trapz`. – Gabriel Nov 16 '18 at 16:58