I wrote this function to perform a rolling sum on numpy arrays, inspired by this post
def np_rolling_sum(arr, n, axis=0):
out = np.cumsum(arr, axis=axis)
slc1 = [slice(None)] * len(arr.shape)
slc2 = [slice(None)] * len(arr.shape)
slc1[axis] = slice(n, None)
slc2[axis] = slice(None, -n)
out = out[tuple(slc1)] - out[tuple(slc2)]
shape = list(out.shape)
shape[axis] = arr.shape[axis] - out.shape[axis]
out = np.concatenate((np.full(shape, 0), out), axis=axis)
return out
It works fine, except when I need to use it on large arrays (size is around 1bn). In that case, I get a SIGKILL on this line:
out = out[tuple(slc1)] - out[tuple(slc2)]
I already tried to delete arr after the cumsum since I no more need it (except from its shape that I can store before the deletion), but it didn't help.
My next guess would be to implement a batch management for the operation causing the memory issue. Is there another way for me to write this function better so it will be able to deal with larger arrays ?
Thanks for your help !