tI need to perform operations on very large arrays (several millions entries), with the cumulated size of these arrays close to the available memory.
I'm understanding that when doing naive operation using numpy like a=a*3+b-c**2
, several temporary arrays are created and thus occupy more memory.
As I'm planning to work at the limit of the memory occupancy, I'm afraid this simple approach won't work. So I'd like to start my developments with the right approach.
I know that packages like numba or pythran can help with improving performance when manipulating arrays, but it is not clear to me if they can deal automatically or not with in-place operations, avoiding temporary objects... ?
As a simple example here's one function I'll have to use on large arrays :
def find_bins(a, indices):
global offset, width, nstep
i = (a-offset) *nstep/ width
i = np.where(i<0,0,i)
i = np.where(i>=nstep,nstep, i)
indices[:] = i.astype(int)
So something that mixes arithmetic operations and calls to numpy functions.
How easy would it be to write such functions using numba or pythran (or something else?) ? What would be the pros and cons in each case ?
Thanks for any hint !
ps: I know about numexpr, but I'm not sure it is convenient or well adapted to functions more complex than a single arithmetic expression ?