I'm using np.roll() to do nearest-neighbor-like averaging, but I have a feeling there are faster ways. Here is a simplified example, but imagine 3 dimensions and more complex averaging "stencils". Just for example see section 6 of this paper.
Here are a few lines from that simplified example:
for j in range(nper):
phi2 = 0.25*(np.roll(phi, 1, axis=0) +
np.roll(phi, -1, axis=0) +
np.roll(phi, 1, axis=1) +
np.roll(phi, -1, axis=1) )
phi[do_me] = phi2[do_me]
So should I be looking for something that returns views instead of arrays (as it seems roll returns arrays)? In this case is roll initializing a new array each time it's called? I noticed the overhead is huge for small arrays.
In fact it's most efficient for arrays of [100,100] to [300,300] in size on my laptop. Possibly caching issues above that.
Would scipy.ndimage.interpolation.shift()
perform better, the way it is implemented here, and if so, is it fixed? In the linked example above, I'm throwing away the wrapped parts anyway, but might not always.
note: in this question I'm looking only for what is available within NumPy / SciPy. Of course there are many good ways to speed up Python and even NumPy, but that's not what I'm looking for here, because I'm really trying to understand NumPy better. Thanks!