3

I have a python script where, as part of an evolutionary optimization algorithm, I'm evaluating partial derivatives many thousands of times. I've done a line by line profile, and this partial derivative calculation is taking up the majority of the run time. I'm using scipy.optimize.approx_fprime to calculate the partial derivatives, and I tried to rewrite it in cython without much success.

The line by line profile is below. My cythonized version of scipy.optimize.approx_fprime is simply called approx_fprime.

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
84                                           @profile
100      1500     14889652   9926.4     25.3      df1 = approx_fprime(inp_nom,evaluate1,epsilon)
101      1500     14939889   9959.9     25.4      df2 = scipy.optimize.approx_fprime(inp_upp,evaluate1,epsilon)

Below is my cython file.

import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False) # turn of bounds-checking for entire function
def approx_fprime(np.ndarray xk, f, double epsilon, *args):
    # From scipy.optimize.approx_fprime
    f0 = f(*((xk,) + args))
    cdef np.ndarray grad = np.zeros((len(xk),), float)
    cdef np.ndarray ei = np.zeros((len(xk),), float)
    cdef np.ndarray d = epsilon * ei
    for k in xrange(len(xk)):
        ei[k] = 1.0
        grad[k] = (f(*((xk + d,) + args)) - f0) / d[k]
        ei[k] = 0.0
    return grad

I've tried to put in all the relevant type declarations and ensure that it plays nicely with numpy. Ultimately, though, the proof is in the pudding, as they say. This version is just not really any faster than the scipy version. The function only has a few variables, so it's not a huge computation and there's probably only room for an incremental improvement in one iteration. However, the function gets called over and over because this is used in an evolutionary optimization algorithm, and so I'm expecting/hoping that an incremental performance gain multiplied many times over will have a big payoff.

Could a cython expert out there take a look at this code and help me figure out if I'm on the right track, or if this is just a fool's errand?

Thank you!

hobscrk777
  • 2,347
  • 4
  • 23
  • 29
  • 1
    The key to success in Cython is to do the math yourself rather than relying on numpy objects and functions, even if they are vectorized. You might need to apply f() to individual elements of the array in a loop. I'm not a Cython expert, I learned some recently in order to make a custom distance matrix function for huge matrices. Here was my thread: http://stackoverflow.com/questions/25213603/speeding-up-distance-matrix-computation-with-numpy-and-cython The accepted answer helped me a lot to understand how things work. The problem is different, but you might still find it useful. – ojy Nov 12 '14 at 00:59
  • 1
    I'm also no Cython expert, but I do know that evaluating numerical derivatives (such as you are attempting to evaluate here) is often much more expensive than evaluating analytical derivatives. Is there any way you could find the derivatives of your function(s) and substitute for them? I don't know the problem you're trying to solve and so whether this might be feasible. – Kyle_S-C Nov 12 '14 at 01:09
  • You might find [pyautodiff](http://datacommunitydc.org/blog/2013/05/pyautodiff-automatic-differentiation-for-numpy/) helpful. It uses Theano, which can run on the GPU and you pass it numpy types. – ryanpattison Nov 12 '14 at 01:42

1 Answers1

0

The first thing to notice is that optimizing code is all about finding bottlenecks in your code. There are typically few functions, loops, etc which consume most of the time. Those are the right candidates for optimization. So most important thing: Evaluate your code performance with a profiler.

The first thing when optimizing your python code is to go through the code line by line and check each line if new objects are created. That's because object creation is extremely expensive compared to simple arithmetic. Rule of thumb: try to avoid object creation whenever possible. But make sure you don't create any new object in your time critical loops.

Have a look at f*((xk + d,) + args). This is perfectly fine python code - but unsuitable if you need high performance. It will create a new argument tuple in every step of the loop. Rewriting that in a way that does not create any objects will probably give you a huge performance boost.

The next step is to start typing statically. Make sure that you type everything that is used in your loops. Typing k will probably gain you a lot.

Afterwards you can try to optimize even further by unsetting the boundscheck etc.

Most important of all is: Do your optimization iteratively and check your performance gain by profiling your code. Most of the time it is not easy to see what really is the bottleneck in your code. Profiling will give you hints: If the optimization did not gain you much, you probably missed the bottleneck.

cel
  • 30,017
  • 18
  • 97
  • 117
  • adding to the answer, a quick way to evaluate your code is use `cython -a ` to generate a html file. The more darker the colour the more time its going to take to run that statement. – neotrinity Nov 12 '14 at 13:58
  • Thanks for the detailed response @cel. Can you be a little more specific when you say to avoid object creation? If `f` is a function handle, how do I pass that in as an argument to the partial derivative function and make use of it without creating objects excessively? – hobscrk777 Nov 12 '14 at 19:55
  • Well, optimizing is always a trade off. Of course the best would be to hardcode your function `f` into your code instead of calling it. But this is probably not an option for you. Yet, you could limit the calls of your function `f` if it calculates all partial derivatives for the whole vector at once. Another idea is not to accept additional arguments `args` for `f`, but expect that the function `f` already knows all those parameters. Fortunately, in python this is not a big issue: You can use `functools.partial` to create a suitable function `f'`. – cel Nov 12 '14 at 20:20
  • Unfortunately using `partial` will again cost you some time: see: http://stackoverflow.com/questions/17388438/python-functools-partial-efficiency. However, I'm pretty confident that this code will be much faster than the code you have at the moment. You will have to try, benchmark and see if that's right, though. – cel Nov 12 '14 at 20:40