-1

I am trying to speed up a specific (numerical) integral in Python. I have evaluated in Mathematica and it takes 14s. In python it takes 15.6 minutes!

The integral I want to evaluate is of the form:

enter image description here

The python code is the following:

from mpmath import hermite

def light_nm( dipol, n, m, t):
    mat_elem = light_amp(n)*light_amp_conj(m)*coef_ground( dipol, n,t)*np.conj(coef_ground( dipol, m,t)) +  \
              light_amp(n+1)*light_amp_conj(m+1)*coef_excit( dipol, n+1,t)*np.conj(coef_excit( dipol, m+1,t))
    return mat_elem


def light_nm_dmu( dipol, n, m, t):
    mat_elem = light_amp(n)*light_amp_conj(m)*(coef_ground_dmu( dipol, n,t)*conj(coef_ground( dipol, m,t)) + coef_ground( dipol, n,t)*conj(coef_ground_dmu( dipol, m,t)) )+    \
            light_amp(n+1)*light_amp_conj(m+1)*(coef_excit_dmu( dipol, n+1,t)*np.conj(coef_excit( dipol, m+1,t)) + coef_excit( dipol, n+1,t)*conj(coef_excit_dmu( dipol, m+1,t)))
    return mat_elem

def prob(dipol, t, x, thlo, cutoff, n, m):
      temp = complex( light_nm(dipol, n, m, t)* cmath.exp(1j*thlo*(n-m)-x**2)*\
                             hermite(n,x)*hermite(m,x)/math.sqrt(2**(n+m)*math.factorial(m)*math.factorial(n)*math.pi))
      return np.real(temp)

def derprob(dipol, t, x, thlo, cutoff, n, m):
      temp = complex( light_nm_dmu(dipol, n, m, t)* cmath.exp(1j*thlo*(n-m)-x**2)*\
                              hermite(n,x)*hermite(m,x)/math.sqrt(2**(n+m)*math.factorial(m)*math.factorial(n)*math.pi))
      if np.imag(temp)>10**(-6):
          print(t)
      return np.real(temp)

def integrand(dipol, t, thlo, cutoff,x):
    return  1/np.sum(np.array([ prob(dipol,t,x,thlo,cutoff,n,m) for n,m in product(range(cutoff),range(cutoff))]))*\
         np.sum(np.array([ derprob(dipol,t,x,thlo,cutoff,n,m) for n,m in product(range(cutoff),range(cutoff))]))**2

def cfi(dipol, t, thlo, cutoff, a):
    global alpha
    alpha = a
    
    temp_func_real = lambda x: np.real(integrand(dipol,t, thlo, cutoff, x))
    temp_real = integ.quad(temp_func_real, -8, 8)
    return  temp_real[0]

The hermite functions are called from the mpmath library. Is there any way to make this code work faster?

Thank you!

UPDATED: I added the whole code. (I'm sorry for the delay) The functions "light_nm_dmu" are similiar to the "light_nm". I tried the answer, but I get an error "TypeError: only size-1 arrays can be converted to Python scalars" in the light_amp function, so I vectorised prob and derprob.

New time is 886.7085871696472 = 14.8 min for the same evaluation (cfi(0.1,1,0,40,1))

  • 4
    Can you provide a [more complete working example](https://stackoverflow.com/help/minimal-reproducible-example). For example, where `light_nm`, `derprob` and `hermite` functions are coming from, and which parameters cause performance issues? – Jérôme Richard Sep 25 '20 at 10:17
  • Not sure if GPU parallelization would help here. You could try CuPy (the fastest way to find out), or implement it in Tensorflow or Pytorch (though you would have to figure out how to parallelize it). – runDOSrun Sep 25 '20 at 10:28
  • I second the first remark. I find the problem interesting, but do not want to spend time trying to figure out how to run the code. If it's not copy-paste, I'm out, and so are many others. Best make sure your problems are always copy-pastable. – Nico Schlömer Sep 25 '20 at 13:38

1 Answers1

0

Suggest using:

  1. Vectorization numpy - evaluate function on a grid of points

  2. Use caching to speed up factorial computation on a large set of numbers i.e. Is math.factorial memorized? (modifying answer by Domenico De Felice)

Updated Code

# use cached factorial function
def prob(dipol, t, x, thlo, cutoff, n, m):
      temp = complex( light_nm(dipol, n, m, t)* cmath.exp(1j*thlo*(n-m)-x**2)*\
                             hermite(n,x)*hermite(m,x)/math.sqrt(2**(n+m)*factorial(m)*factorial(n)*math.pi))
      return np.real(temp)

# Vectorize computation
def integrand(dipol, t, thlo, cutoff,x):
    xaxis = np.arange(0, cutoff)
    yaxis = np.arange(0, cutoff)

    return  1/np.sum(prob(dipol,t,x,thlo,cutoff,xaxis[:, None] , yaxis[None, :]))*\
         np.sum(derprob(dipol,t,x,thlo,cutoff,xaxis[:, None] , yaxis[None, :]))**2

# unchanged
def cfi(dipol, t, thlo, cutoff, a):
    global alpha
    alpha = a
    
    temp_func_real = lambda x: np.real(integrand(dipol,t, thlo, cutoff, x))
    temp_real = integ.quad(temp_func_real, -8, 8)
    return  temp_real[0]

# Cached factorial
def factorial(num, fact_memory = {0: 1, 1: 1, 'max': 1}):
    ' Cached factorial since we're computing on lots of numbers '
    # Factorial is defined only for non-negative numbers
    assert num >= 0

    if num <= fact_memory['max']:
        return fact_memory[num]

    for x in range(fact_memory['max']+1, num+1):
        fact_memory[x] = fact_memory[x-1] * x
        
    fact_memory['max'] = num
    return fact_memory[num]
DarrylG
  • 16,732
  • 2
  • 17
  • 23
  • Using dict for memoization is generic but slower than just a plain array here (and enough here). Actually, caching can be applied to the whole expression `1/math.sqrt(2**(n+m)*factorial(m)*factorial(n)*math.pi)`. – Jérôme Richard Sep 25 '20 at 13:01
  • @JérômeRichard--The idea behind just caching factorial is that when you have factorial 50 already cached, you can quickly calculate factorial of other values by multiplying a subset of number (i.e. if factorial 50 is cached, to compute factorial 55 you only need to compute 55*54*53*52*51*cache_value(50)). Regarding dict vs. array in some other projects, the difference in caching performance was small compared to other calculations in the program so I didn't bother trying to improve Domenico De Felice posted answer. – DarrylG Sep 25 '20 at 13:11