Speeding up the code using numpy

Question

I'm new to python, and I have this code for calculating the potential inside a 1x1 box using fourier series, but a part of it is going way too slow (marked in the code below).

If someone could help me with this, I suspect I could've done something with the numpy library, but I'm not that familiar with it.

import matplotlib.pyplot as plt
import pylab
import sys
from matplotlib import rc
rc('text', usetex=False)
rc('font', family = 'serif')

#One of the boundary conditions for the potential.

def func1(x,n):
    V_c = 1
    V_0 = V_c * np.sin(n*np.pi*x)
    return V_0*np.sin(n*np.pi*x)

#To calculate the potential inside a box:
def v(x,y):
    n = 1;
    sum = 0;
    nmax = 20;

    while n < nmax:
        [C_n, err] = quad(func1, 0, 1, args=(n), );
        sum = sum + 2*(C_n/np.sinh(np.pi*n)*np.sin(n*np.pi*x)*np.sinh(n*np.pi*y));
        n = n + 1;


    return sum;

def main(argv):
    x_axis = np.linspace(0,1,100)
    y_axis = np.linspace(0,1,100)
    V_0 = np.zeros(100)
    V_1 = np.zeros(100)

    n = 4;

    #Plotter for V0 = v_c * sin () x

    for i in range(100):
        V_0[i] = V_0_1(i/100, n)

    plt.plot(x_axis, V_0)
    plt.xlabel('x/L')
    plt.ylabel('V_0')
    plt.title('V_0(x) = sin(m*pi*x/L), n = 4')
    plt.show()

    #Plot for V_0 = V_c(1-(x-1/2)^4)

    for i in range(100):
        V_1[i] = V_0_2(i/100)


    plt.figure()

    plt.plot(x_axis, V_1)
    plt.xlabel('x/L')
    plt.ylabel('V_0')
    plt.title('V_0(x) = 1- (x/L - 1/2)^4)')
    #plt.legend()
    plt.show()

    #Plot V(x/L,y/L) on the boundary:

    V_0_Y = np.zeros(100)
    V_1_Y = np.zeros(100)
    V_X_0 = np.zeros(100)
    V_X_1 = np.zeros(100)

    for i in range(100):
        V_0_Y[i] = v(0, i/100)
        V_1_Y[i] = v(1, i/100)
        V_X_0[i] = v(i/100, 0)
        V_X_1[i] = v(i/100, 1)

    # V(x/L = 0, y/L):

    plt.figure()
    plt.plot(x_axis, V_0_Y)
    plt.title('V(x/L = 0, y/L)')
    plt.show()

    # V(x/L = 1, y/L):

    plt.figure()
    plt.plot(x_axis, V_1_Y)
    plt.title('V(x/L = 1, y/L)')
    plt.show()

    # V(x/L, y/L = 0):

    plt.figure()
    plt.plot(x_axis, V_X_0)
    plt.title('V(x/L, y/L = 0)')
    plt.show()

    # V(x/L, y/L = 1):

    plt.figure()
    plt.plot(x_axis, V_X_1)
    plt.title('V(x/L, y/L = 1)')
    plt.show()

    #Plot V(x,y)
####### 
# This is where the code is way too slow, it takes like 10 minutes when n in v(x,y) is 20.
#######

    V = np.zeros(10000).reshape((100,100))
    for i in range(100):
        for j in range(100):
            V[i,j] = v(j/100, i/100)

    plt.figure()
    plt.contour(x_axis, y_axis, V,  50)
    plt.savefig('V_1')
    plt.show()

if __name__ == "__main__":
    main(sys.argv[1:])

Very quick note, as I'm still reviewing it: `while` loops are on the borderline of non-Pythonic. For example, the `while n < nmax` can be changed to something like `for n in range(1,21):` instead, and so on. — WGS, Oct 21 '14 at 19:32
Is this the full code? On lines 40 and 51 you refer to functions `V_0_1` and `V_0_2`, but they are not defined elsewhere. Try to make a small self-contained example that shows the problem. So plots can be left out too. — Oliver W., Oct 21 '14 at 21:06

score 0 · Answer 1 · edited May 23 '17 at 12:11

You can find how to use FFT/DFT in this document :

Discretized continuous Fourier transform with numpy

Also, regarding your V matrix, there are many ways to improve the execution speed. One is to make sure you use Python 3, or xrange() instead of range() if you a are still in Python 2.. I usually put these lines in my Python code, to allow it to run evenly wether I use Python 3. or 2.*

# Don't want to generate huge lists in memory... use standard range for Python 3.*
range = xrange if isinstance(range(2),
                             list) else range

Then, instead of re-computing j/100 and i/100, you can precompute these values and put them in an array; knowing that a division is much more costly than a multiplication ! Something like :

ratios = np.arange(100) / 100

V = np.zeros(10000).reshape((100,100))
j = 0
while j < 100:
    i = 0
    while i < 100:
        V[i,j] = v(values[j], values[i])
        i += 1
    j += 1

Well, anyway, this is rather cosmetic and will not save your life; and you still need to call the function v()...

Then, you can use weave :

http://docs.scipy.org/doc/scipy-0.14.0/reference/tutorial/weave.html

Or write all your pure computation/loop code in C, compile it and generate a module which you can call from Python.

score 0 · Answer 2 · answered Oct 21 '14 at 22:33

You should look into numpy's broadcasting tricks and vectorization (several references, one of the first good links that pops up is from Matlab but it is just as applicable to numpy - can anyone recommend me a good numpy link in the comments that I might point other users to in the future?).

What I saw in your code (once you remove all the unnecessary bits like plots and unused functions), is that you are essentially doing this:

from __future__ import division
from scipy.integrate import quad
import numpy as np
import matplotlib.pyplot as plt

def func1(x,n):
    return 1*np.sin(n*np.pi*x)**2

def v(x,y):
    n = 1;
    sum = 0;
    nmax = 20;

    while n < nmax:
        [C_n, err] = quad(func1, 0, 1, args=(n), );
        sum = sum + 2*(C_n/np.sinh(np.pi*n)*np.sin(n*np.pi*x)*np.sinh(n*np.pi*y));
        n = n + 1;
    return sum;

def main():
    x_axis = np.linspace(0,1,100)
    y_axis = np.linspace(0,1,100)
#######
# This is where the code is way too slow, it takes like 10 minutes when n in v(x,y) is 20.
#######

    V = np.zeros(10000).reshape((100,100))
    for i in range(100):
        for j in range(100):
            V[i,j] = v(j/100, i/100)

    plt.figure()
    plt.contour(x_axis, y_axis, V,  50)
    plt.show()

if __name__ == "__main__":
    main()

If you look carefully (you could use a profiler too), you'll see that you're integrating your function func1 (which I'll rename into the integrand) about 20 times for each element in the 100x100 array V. However, the integrand doesn't change! So you can already bring it out of your loop. If you do that, and use broadcasting tricks, you could end up with something like this:

import numpy as np
from scipy.integrate import quad
import matplotlib.pyplot as plt

def integrand(x,n):
    return 1*np.sin(n*np.pi*x)**2

sine_order = np.arange(1,20).reshape(-1,1,1) # Make an array along the third dimension
integration_results = np.empty_like(sine_order, dtype=np.float)
for enu, order in enumerate(sine_order):
    integration_results[enu] = quad(integrand, 0, 1, args=(order,))[0]

y,x = np.ogrid[0:1:.01, 0:1:.01]

term = integration_results / np.sinh(np.pi * sine_order) * np.sin(sine_order * np.pi * x) * np.sinh(sine_order * np.pi * y)
# This is the key: you have a 3D matrix here and with this summation, 
# you're basically squashing the entire 3D structure into a flat, 2D 
# representation. This 'squashing' is done by means of a sum.
V = 2*np.sum(term, axis=0)  

x_axis = np.linspace(0,1,100)
y_axis = np.linspace(0,1,100)
plt.figure()
plt.contour(x_axis, y_axis, V,  50)
plt.show()

which runs in less than a second on my system. Broadcasting becomes much more understandable if you take pen&paper and draw out the vectors that you are "broadcasting" as if you were constructing a building, from basic Tetris-blocks.

These two versions are functionally the same, but one is completely vectorized, while the other uses python for-loops. As a new user to python and numpy, I definitely recommend reading through the broadcasting basics. Good luck!

Oh, I also want to note that your integrand is always equal to `.5` as long as n is an integer. For completeness, I've left the quadrature in the code, as you might want to change the integrand to something more complex. — Oliver W., Oct 21 '14 at 22:53

Speeding up the code using numpy

2 Answers2