Python: improving performance of a for loop with function call inside that depends on loop index only

Question

I am guilty of writing code in python as if it were Fortran. I am rewriting many parts of a long code already written by myself in Fortran, because i want to extend the code significantly and it is MUCH easier to extend in python for proof of concept work. If it can be sped up enough however, I will simply use python I am not actually interested in turning the crank on the programs over and over. Once an idea is proven to work, I move on to the next problem. This is why I wish to work in python. Unfortunately, right now, it takes several weeks to run as written in python. Even an order of magnitude speedup on the following for loop would make it a feasible testing platform.

A similar question for the R language has been posed

Improving loop performance with function call inside

but surprisingly I don't see one for python. This one is similar but the function in the for loop has dependencies, which mine does not

Improve performance of a for loop in Python (possibly with numpy or numba)

A huge bottleneck is a single for loop

Simple Code

import numpy as np

part  = 3 # a random index of an array, fixed here for example purposes
nmol = 1000
energy = np.zeros((nmol),dtype=np.float_)

for i in range(nmol):
    energy[i] = np.where( part != i,function(part,i),0.0) # if i = part, energy = 0.0

speeding up the function itself is another and separate problem. There must be a way to use numpy or another method to run all calls simultaneously

For example purposes lets say

def function(i,j):
    for k in range(100000): # this loop is simply to make the time about a second or 2
        ener = (i + j) * (i * j) # entirely arbitrary and not my real problem
    return ener

In reality my function calls several function that depend on part and "i".

The full working example is:

import numpy as np
import time as time

def function(i,j):
    for k in range(10000): # this loop is simply to make the time about a second or 2
        ener = (i + j) * (i * j) # entirely arbitrary and not my real problem
    return ener

part  = 3 # a random index of an array, fixed here for example purposes
nmol = 1000
energy = np.zeros((nmol),dtype=np.float_)

start = time.time()
for i in range(nmol):
    energy[i] = np.where( part != i,function(part,i),0.0) # if i = part, energy = 0.0

end = time.time()
print('time: ', end-start)

I am using Python version 3.6. It is imperative that in the loop index "i" does not interact with the index "part".

are you open to multi processing? if yes you could refer to https://docs.python.org/3.4/library/multiprocessing.html?highlight=process. Its pretty hard to see how to improve the iteration speed of the loop. I am not sure you can use list comprehension since energy is a numpy object — Siddharth Chabra, Aug 05 '18 at 19:44
energy simply holds floating point scalars in each element... it can be instead a list or array or whatever is required for speed :). I am open to anything that makes my code faster! — Charlie Crown, Aug 05 '18 at 19:49
The `for` loop in your `function(i,j)` is redundant. You know that, right? — Sheldore, Aug 05 '18 at 20:13
Use numba. Or, if that's too restrictive, use cython. Also, the julia ecosystem is becoming stronger, plus (compiled) julia code can call directly into python code. — J_H, Aug 05 '18 at 20:16
The only reason for the for loop in the function is to make the process take about 2 seconds rather than 0.00002 seconds — Charlie Crown, Aug 05 '18 at 20:20
No, the function is a lot more complicated and also self contained, it can be optimized separately. All want is to call the function simultaneously for each index in the for loop. I have vectorized it myself and posted the answer... — Charlie Crown, Aug 05 '18 at 21:12
Sorry I deleted my comment thinking it was not useful, I still don't get if you vectorized or not the function but I am glad you have found a solution — xdze2, Aug 05 '18 at 21:21
Does the function(s) you wan't to call include any non-numpy code? Otherwise this problem would be really easy to solve using Numba as @J_H already mentioned. — max9111, Aug 06 '18 at 09:15
The code is there, if it is easy to do in numba, why not just add it in... the code works... — Charlie Crown, Aug 08 '18 at 13:04

Charlie Crown · Accepted Answer · 2018-08-05T21:22:33.573

I am quite surprised that I am the one to figure out an answer - I am not intentionally answering my own question... I have just been thinking about it on my own for while now.

rather than a for loop from index 0 to nmol, make an integer array from 0 to nmol. simply call the function by passing the integer array. Thus an array input receives an array output. I modified the function so that it didn't require the constant "part"

This vectorized solution is ~27 times faster than the for loop, giving me the order of magnitude I required.

As the array length of size nmol gets bigger, the speed gains increase, and vice versa.

import numpy as np
import time as time

def function(i):
    for k in range(10000):
        ener = (i + part) + (i * part) # entirely arbitrary and not my real problem
    return ener

part  = 3 # a random index of an array, fixed here for example purposes
nmol = 1000

start = time.time()

part_list = np.arange(0,nmol,1)
part_list = np.delete(part_list,part)  # remove the self index

energy =function(part_list) # calls the function in a vectorized form.

end = time.time()
time2 = end-start

score 1 · Answer 2 · answered Oct 28 '18 at 08:01

based on your vectorized solution, you can get an extra speedup using pythran

Original code:

import numpy as np

def function(i, part):
    for k in range(10000):
        ener = (i + part) + (i * part)
    return ener

and associated benchmark:

python -m timeit -s 'import numpy as np; part  = 3; nmol = 1000; part_list = np.arange(0,nmol,1); part_list = np.delete(part_list, part); from a import function' 'function(part_list, part)'
10 loops, best of 3: 37.3 msec per loop

Then adding a pythran export comment

import numpy as np

#pythran export function(int64[], int64)
def function(i, part):
    for k in range(10000):
        ener = (i + part) + (i * part)
    return ener

And compiling the module with:

pythran a.py

Gives an extra boost:

python -m timeit -s 'import numpy as np; part  = 3; nmol = 1000; part_list = np.arange(0,nmol,1); part_list = np.delete(part_list, part); from a import function' 'function(part_list, part)'
1000000 loops, best of 3: 1.53 usec per loop

Python: improving performance of a for loop with function call inside that depends on loop index only

Simple Code

The full working example is:

2 Answers2