9

Context: I've recently discovered the alglib library (for numerical computation), which seems to be the thing I was looking for (robust interpolation, data analysis...) and could not really find in numpy or scipy.

However, I'm concerned about the fact that (eg. for interpolation) it does not accept numpy array as valid input format, but only regular python list objects.

Problem: I've dug a bit into the code and documentation, and found (as expected) that this list format is just for transition, since the library will anyway convert it into ctypes (the cpython library is just an interface for the underlying C/C++ library).

That is where comes my concern: inside my code, I'm working with numpy arrays, because it is a big performance boost for the scientific calculations I'm performing on it. Thus I fear having to convert any data passed to alglib routines into list (which will be converted into ctypes) will have a huge impact on the performance (I'm working with arrays that could have hundreds of thousands floats inside, and with thousands of arrays).

Question: Do you think that I will indeed have a performance loss, or do you think I should start modifying the alglib code (only the python interface) so that it could accept numpy arrays, and make only one conversion (from numpy arrays to ctypes)? I don't even know if this is feasible, because it is quite a big library... Maybe you guys have better ideas or suggestions (even on similar but different libraries)...


EDIT

It seems my problem is not getting a lot of interest, or that my question is not clear/relevant. Or maybe nobody has a solution or advice, but I doubt with so many experts around :) Anyway, I've written a small, quick and dirty test code to illustrate the problem...

#!/usr/bin/env python

import xalglib as al
import timeit
import numpy as np

def func(x):
    return (3.14 *x**2.3 + x**3 -x**2.34 +x)/(1.+x)**2

def fa(x, y, val=3.14):
    s = al.spline1dbuildakima(x, y)
    return (al.spline1dcalc(s, val), func(val))

def fb(x, y, val=3.14):
    _x = list(x)
    _y = list(y)
    s = al.spline1dbuildakima(_x, _y)
    return (al.spline1dcalc(s, val), func(val))

ntot = 10000
maxi = 100
x = np.random.uniform(high=maxi, size=ntot)
y = func(x)
xl = list(x)
yl = list(y)

print "Test for len(x)=%d, and x between [0 and %.2f):" % (ntot, maxi)
print "Function: (3.14 *x**2.3 + x**3 -x**2.34 +x)/(1.+x)**2"
a, b = fa(xl, yl)
err = np.abs(a-b)/b * 100
print "(x=3.14) interpolated, exact =", (a, b)
print "(x=3.14) relative error should be <= 1e-2: %s (=%.2e)" % ((err <= 1e-2), err)

if __name__ == "__main__":
    t = timeit.Timer(stmt="fa(xl, yl)", setup="from __main__ import fa, xl, yl, func")
    tt = timeit.Timer(stmt="fb(x, y)", setup="from __main__ import fb, x, y, func")
    v = 1000 * t.timeit(number=100)/100
    vv = 1000 * tt.timeit(number=100)/100
    print "%.2f usec/pass" % v
    print "%.2f usec/pass" % vv
    print "%.2f %% less performant using numpy arrays" % ((vv-v)/v*100.)

and running it, I'm getting:

"""
Test for len(x)=10000, and x between [0 and 100.00):
Function: (3.14 *x**2.3 + x**3 -x**2.34 +x)/(1.+x)**2
(x=3.14) interpolated, exact = (3.686727834705164, 3.6867278531266905)
(x=3.14) relative error should be <= 1e-2: True (=5.00e-07)
25.85 usec/pass
28.46 usec/pass
10.09 % less performant using numpy arrays
"""

Performance loss oscillates between about 8% and 14%, which is huge to me...

Eric O. Lebigot
  • 91,433
  • 48
  • 218
  • 260
mhavel
  • 735
  • 1
  • 7
  • 19

3 Answers3

5

You can create you own wrap function that pass numpy array's data buffer to the vector's data pointer directly, this will not copy the data, and speedup your wrap function a lot. The following code pass x.ctypes.data to x_vector.ptr.p_ptr, where x is a numpy array.

when you pass numpy array, you must ensure that the array has it's elements in contiguous memory. The following code doen't check this.

import xalglib as al
import numpy as np
import ctypes

def spline1dbuildakima(x, y):
    n = len(x)
    _error_msg = ctypes.c_char_p(0)
    __c = ctypes.c_void_p(0)
    __n = al.c_ptrint_t(n)
    __x = al.x_vector(cnt=n, datatype=al.DT_REAL, owner=al.OWN_CALLER, 
                      last_action=0,ptr=al.x_multiptr(p_ptr=x.ctypes.data))
    __y = al.x_vector(cnt=n, datatype=al.DT_REAL, owner=al.OWN_CALLER, 
                      last_action=0,ptr=al.x_multiptr(p_ptr=y.ctypes.data))

    al._lib_alglib.alglib_spline1dbuildakima(
        ctypes.byref(_error_msg), 
        ctypes.byref(__x), 
        ctypes.byref(__y), 
        ctypes.byref(__n), 
        ctypes.byref(__c))

    __r__c = al.spline1dinterpolant(__c)
    return __r__c    

def func(x):
    return (3.14 *x**2.3 + x**3 -x**2.34 +x)/(1.+x)**2

def fa(x, y, val=3.14):
    s = spline1dbuildakima(x, y)
    return al.spline1dcalc(s, val), func(val)

def fb(x, y, val=3.14):
    s = al.spline1dbuildakima(x, y)
    return al.spline1dcalc(s, val), func(val)

ntot = 10000
maxi = 100
x = np.random.uniform(high=maxi, size=ntot)
y = func(x)
xl = list(x)
yl = list(y)

import time
start = time.clock()
for i in xrange(100):
    a, b = fa(x, y)
print time.clock()-start
err = np.abs(a-b)/b * 100
print a, b, err

start = time.clock()
for i in xrange(100):
    a, b = fb(xl, yl)
print time.clock()-start
err = np.abs(a-b)/b * 100
print a, b, err

The output is:

0.722314760822 <- seconds of numpy array version
3.68672728107 3.68672785313 1.55166878281e-05
3.22011891502  <- seconds of list version
3.68672728107 3.68672785313 1.55166878281e-05
HYRY
  • 94,853
  • 25
  • 187
  • 187
  • 1
    you could call `x = np.ascontiguousarray(x)` to make sure it is contiguous in memory. – jfs Feb 27 '12 at 01:54
  • wow, nice! Thanks a lot HYRY. I've never used wrapping methods before, so I was reading some documentation that EOL nicely pointed out. Your example come at a good time for me, so I now have something concrete to train with. – mhavel Feb 27 '12 at 05:24
  • @Sebastian Thanks you for your comment. This will help me too. – mhavel Feb 27 '12 at 05:25
3

Making the C++ alglib accept NumPy arrays is certainly doable: SciPy does this. The question is really how difficult it is. You might want to try one of the semi-automatic C++ → Python wrapping program, like (starting with the one I would start with–warning: I'm no expert):

On a different subject: I have used interpolation splines in SciPy with success, in the past. I am not sure that this would be sufficient for your needs, though, since you did not find everything you wanted in SciPy.

Eric O. Lebigot
  • 91,433
  • 48
  • 218
  • 260
  • Thanks for your answer EOL. The splines in SciPy package are working as expected, but I was missing some splines like Akima (I do have some data that require more than a linear interpolation, but are not "well-behaved" enough to use a cubic or higher order spline without getting some unwanted artifacts such oscillations in the results). Also, alglib is having some other data analysis and statistical tools that SciPy does not have, so it makes sense to me to use such an external library. For the wrapping side, it seems alglib developpers are already using one for C->Python... – mhavel Feb 26 '12 at 00:55
  • @mhavel: It is possible to wrap C++ code so that it directly uses NumPy arrays instead of Python lists; it is surprising that alglib's Python wrapper does not do/allow this. It is probably easier to wrap the functions that you need yourself than to modify the alglib Python wrapper: you will be able to udpate your wrapper to follow alglib's updates more easily, if need be. The three wrappers that I listed should really help you a lot. – Eric O. Lebigot Feb 26 '12 at 01:00
  • oh ok:) I'm not an expert in Python, and even less in wrapping. So I may give it a try. Thanks a lot. – mhavel Feb 26 '12 at 01:08
  • @mhavel: Thanks. I'm no expert in wrapping C++ code for Python, but investigating some of the listed C++ wrappers is likely to be a good investment, for someone interested in fast scientific calculations. – Eric O. Lebigot Feb 26 '12 at 02:33
1

In addition to EOL's answers you could also try

in order to generate a Python interface that deals with NumPy arrays but calls the underlying C/C++ with the appropriate arguments.

I found the docs sufficiently clear to do this for a small scientific C library without ever having done this before or having had vast experience with interfacing C and Python.

orbeckst
  • 3,189
  • 1
  • 17
  • 14