How to make for loops faster in Python?

Question

I am writing a script in Python 2.7 that will train a neural network. As a part of the main script I need a program that solves 2D heat conduction partial derivative eauation. Previously I wrote this program in Fortran and then rewrote it in Python. The time that is required by fortran is 0.1 s, while Python requires 13 s! It is absolutely unacceptable for me since in that case computational time will be determined by the part of the program that solves PDE, but not by the epochs of training of the neural network.

How to solve that problem?

It seems that I cannot vectorize a matrix since a new element t[i.j] is calculated using value t[i-1,j], etc.

Here is the part of the code that is running slowly:

while (norm > eps):
    # old value
    t_old = np.copy(t)

    # new value
    for i in xrange(1,n-1):
        for j in xrange(1,m-1):
            d[i] = 0.0
            a[i+1,j] = (0.5*dx/k[i,j] + 0.5*dx/k[i+1,j])
            a[i-1,j] = (0.5*dx/k[i,j] + 0.5*dx/k[i-1,j])
            a[i,j+1] = (0.5*dy/k[i,j] + 0.5*dy/k[i,j+1])
            a[i,j-1] = (0.5*dy/k[i,j] + 0.5*dy/k[i,j-1])
            a[i,j] = a[i+1,j] + a[i-1,j] + a[i,j+1] + a[i,j-1]
            sum = a[i+1,j]*t[i+1,j] + a[i-1,j]*t[i-1,j] + a[i,j+1]*t[i,j+1] + a[i,j-1]*t[i,j-1] + d[i]
            t[i,j] = ( sum + d[i] ) / a[i,j]
            k[i,j] = k_func(t[i,j])
    # matrix 2nd norm
    norm = np.linalg.norm(t-t_old)

This is unlikely to get much more efficient in raw Python. You could try running it in PyPy instead, but a probably more performant way to solve it would be to use some library that allows for vectorization, like numpy or TensorFlow. The latter was actually developed primarily for training neural nets. — Thomas, Sep 15 '19 at 17:29
Iteration with `numpy` is slow. Try to use whole array operations. — hpaulj, Sep 15 '19 at 17:51
you should use [vectorization](https://stackoverflow.com/questions/47755442/what-is-vectorization) — Quickbeam2k1, Sep 15 '19 at 18:03
maybe you are actually looking for scipy.ndimage.filters.laplace — Christian B., Sep 15 '19 at 19:24
@roganjosh Numba perfectly solved this problems! Exactly what I was looking for! — Arsen, Sep 18 '19 at 17:35

Roland Smith · Answer 1 · 2019-09-15T19:30:50.253

Pure Python optimizations

This won't bring all that much, but it is the easiest.

Eliminate dead code. In the inner loop, d[i] is set to zero. And then it is added to something else in two places. Adding 0 doesn't change anything, so you can remove d[i] altogether.
calculate things only once. k[i,j], 0.5*dx and 0.5*dy are used four times. So calculate them once and assign them to local variables.
remove unneccesary array access. In the inner loop, only five elements of the a matrix are calculated and used. So replace the matrix element by local variables a1 up to and including a5.

The code now looks like this:

while (norm > eps):
    # old value
    t_old = np.copy(t)

    # new value
    for i in xrange(1,n-1):
        for j in xrange(1,m-1):
            px = 0.5*dx
            py = 0.5*py
            q = k[i,j]
            a1 = (px/q + px/k[i+1,j])
            a2 = (px/q + px/k[i-1,j])
            a3 = (py/q + py/k[i,j+1])
            a4 = (py/q + py/k[i,j-1])
            a5 = a1 + a2 + a3 + a4
            sum = a1*t[i+1,j] + a2*t[i-1,j] + a3*t[i,j+1] + a4*t[i,j-1]
            t[i,j] = sum / a5
            k[i,j] = k_func(t[i,j])
    # matrix 2nd norm
    norm = np.linalg.norm(t-t_old)

Since your example doesn't give complete working code, I cannot measure the effects.

However, looping in Python is relatively inefficient. For good performance in pure Python it is better to use list comprehensions instead of loops. That is because in comprehensions the looping is done in the Python runtime in C, instead of in Python bytecode. But since we're already dealing with numpy arrays here, I will not expand on this.

Recode your algorithm to use `numpy` instead of loops

The basic idea behind numpy is that it has optimized routines (written in C or Fortran) for array operations. So for operating on arrays you should use numpy functions instead of loops!

Your loop consists mostly of filling a matrix with values derived from another matrix shifted one column or row. For that you could do something like this.

In this example I'll be shifting k one row down:

In [1]: import numpy as np

In [2]: k = np.arange(1, 26).reshape([5,5])
Out[2]: 
array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])

In [3]: dx = 0.27
Out[3]: 0.27

In [4]: 0.5*dx/k[1:,:]
Out[4]: 
array([[0.0225    , 0.01928571, 0.016875  , 0.015     , 0.0135    ],
       [0.01227273, 0.01125   , 0.01038462, 0.00964286, 0.009     ],
       [0.0084375 , 0.00794118, 0.0075    , 0.00710526, 0.00675   ],
       [0.00642857, 0.00613636, 0.00586957, 0.005625  , 0.0054    ]])

In [5]: np.insert(0.5*dx/k[1:,:], 0, 0, axis=0)
Out[5]: 
array([[0.        , 0.        , 0.        , 0.        , 0.        ],
       [0.0225    , 0.01928571, 0.016875  , 0.015     , 0.0135    ],
       [0.01227273, 0.01125   , 0.01038462, 0.00964286, 0.009     ],
       [0.0084375 , 0.00794118, 0.0075    , 0.00710526, 0.00675   ],
       [0.00642857, 0.00613636, 0.00586957, 0.005625  , 0.0054    ]])

If I understand OP's code, the assignment `k[i,j]=...` modifies a value that later is used in the following iterations, is vectorization really possible? — gboffi, Sep 15 '19 at 19:49
@gboffi Not for the outer `while` loop. But for the inner loops it seems possible. — Roland Smith, Sep 15 '19 at 20:23
Dear Roland Smith, Thanks for your answer. Pure Python optimizations decreased the time from 13 to 9 seconds. I was thinking about usage of Cython, but still cannot implement that option since I am encountering many errors =( — Arsen, Sep 18 '19 at 16:19
@Arsen Replacing the loops by `numpy` functions should work even better. — Roland Smith, Sep 18 '19 at 17:11
I tried, but my python skills are not so good. I solved this problem by using Numba (this library makes python to copile a code when python executes it for the first time). When I call the function for the first time it takes ~767 ms, then it takes 46-47 ms each time. That's perfectly works for my future goal because I will call this function many times when will train my NN. Anyway, thanks for your help! — Arsen, Sep 18 '19 at 17:34

How to make for loops faster in Python?

Here is the part of the code that is running slowly:

1 Answers1

Pure Python optimizations

Recode your algorithm to use numpy instead of loops

Recode your algorithm to use `numpy` instead of loops