9

I just wrapped a Fortran 90 subroutine to python using F2PY. The subtlety here is that the Fortran subroutine aslo takes a python call-back function as one of its arguments:

SUBROUTINE f90foo(pyfunc, a)
real(kind=8),intent(in) :: a
!f2py intent(callback) pyfunc
external pyfunc
!f2py real*8 y,x
!f2py y = pyfunc(x)

!*** debug begins***
print *, 'Start Loop'
do i=1,1000
  p = pyfunc(a)
end do
total = etime(elapsed)
print *, 'End: total=', total, ' user=', elapsed(1), ' system=', elapsed(2)
stop
!*** debug ends  ***

The pyfunc is a python function defined elsewhere in my python code. The wrapper works fine, but running the wrapped version above, I got an elapsed time about factor of 5 times longer than what I can get using pure python as follows,

def pythonfoo(k):
    """ k: scalar 
        returns: scalar
    """
    print('Pure Python: Start Loop')
    start = time.time()
    for i in xrange(1000):
        p = pyfunc(k)
    elapsed = (time.time() - start)
    print('End: total=%20f'% elapsed)

So, the question is, what is the overhead coming from? I really want to leave pyfunc as is because it is extremely time-consuming to re-code it into pure fortran function, so is there any way to improve the speed of the wrapper module?

nye17
  • 12,857
  • 11
  • 58
  • 68

1 Answers1

9

In the code you posted, a is double precision float. Passing it from Fortran to Python means wrapping the Fortran double to a PyFloat object, which does have a cost. In the pure Python version, k is a PyFloat and you don't pay the price for wrapping it 1000 times.

Another issue is the function call itself. Calling Python functions from C is already bad performance-wise, but calling them from Fortran is worse, because there is an additional layer of code to transform the Fortran function call conventions (regarding the stack etc.) to C function call conventions. When calling a Python function from C, you need to prepare the arguments as Python objects, generally create a PyTuple object to serve as the *args argument of the Python function, make a lookup in the table of the module to get the function pointer...

Last but not least: you need to take care of the array orders when passing 2D arrays between Fortran and Numpy. F2py and numpy can be smart in that regard, but you'll get performance hits if your Python code is not written to manipulate the arrays in Fortran order.

I don't know what pyfunc is meant to do, but if it is close to what you posted, writing the loop in Python, and calling the function only once will save you time. And if you need the intermediate values (p), let your Python function return a Numpy array with all the intermediate values.

gurney alex
  • 13,247
  • 4
  • 43
  • 57
  • Thanks for the comment! Currently `pyfunc` is mostly taking scalar arguments inside Fortran although it is designed in python to be able to take either nparrays or scalars. – nye17 Sep 23 '11 at 06:30
  • so basically calling python functions in to-be-wrapped fortran codes will cost 5 times more running time than pure python ones and there is no room for improvement? The problem here is that the `pyfunc` is extremely difficulty to code in pure `fortran` while speed is also a great concern... – nye17 Sep 23 '11 at 06:35
  • if speed is a great concern, why don't you try to optimize pyfunc by writing pieces of it in fortran using f2py? – steabert Sep 23 '11 at 08:09
  • @steabert Well, `pyfunc` itself in python is not slow at all, thus only re-coding pieces of it in fortran won't buy me much, and I guess the calling-pyfunc-back-n-forth strategy would even make things worse, as it is the overhead between the two layers that's killing me. – nye17 Sep 23 '11 at 08:23
  • then you will either have to move it entirely to fortran if you want to use it there, or if you just want faster looping, try mapping in python instead of looping, see: http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Loops – steabert Sep 23 '11 at 08:25
  • I'm thinking about passing an array of tabulated outputs from `pyfunc` to the fortran subroutine then doing interpolation inside Fortran. `pyfunc` is a relatively smooth function but with a very large dynamic range in both x and y, so now the problem seems to get down to "how to pass large arrays to fortran efficiently" and "how to interpolate arrays in fortran efficiently"...I'll try to code them up to see how it goes. – nye17 Sep 23 '11 at 08:35
  • create the numpy array in Fortran order. I leave the interpolation to you, there are probably already questions on this topic in StackOverflow. – gurney alex Sep 23 '11 at 09:36
  • @gurneyalex I know how to initialize ndarrays in Fortran order using `numpy.ones`, and `numpy.zeros` etc by specifying the `order="F"` option, but the output from `pyfunc` is created on the fly without F-order, how can I convert the array to Fortran order? and is this conversion a gain at all considering the extra memory/time spent? – nye17 Sep 23 '11 at 09:43
  • @nye17: if it is a 1D array, the order is the same. Otherwise, the array.asfortranarray() method returns a new array in column major order. – gurney alex Sep 23 '11 at 12:27
  • I do not think if the callback performance would differ between Fortran and C. The Fortran interface can be made identical to the C code as if pure C code is calling Python. – Scientist May 28 '21 at 20:39