Optimizing a rebin of a numpy array to arbitrary binsize

Question

I'm building from this question. I'm re-binning a numpy array using the solution posted there, with a small addition for the extra:

from numpy import arange,append

x = arange(20)
x = x[:(x.shape[0]/bin)*bin].reshape((x.shape[0]//bin,-1)).mean(1)
x=  append(x,x[(x.shape[0]/bin)*bin:].mean())

This is to handle non divisor bins of x.shape[0]. The append adds the average of the remaining cells. The thing is I'm making a lot of arrays here, and beyond memory that can't be runtime efficient. Is there a better way? I'm even considering transferring to lists, re-binning, and finally using array(result) and return that.

To be clear for bin=6, the first line yields:

array([  2.5,   8.5,  14.5])

and the second will append:

18.5

Before the mean operator the resulting matrices are:

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17]])

and the second:

array([18, 19])

The final result is of course:

array([  2.5,   8.5,  14.5,  18.5])

@Divakar more an opening parenthesis, I'll add your suggestions. — kabanus, Dec 06 '16 at 12:38

Daniel F · Answer 1 · 2016-12-06T13:24:41.053

2

This should work I think if you absolutely want one array

def rebin(x,bin):
    x_pad=np.lib.pad(x,(0,bin-x.size%bin), 'constant').reshape(bin,-1)
    return np.hstack((np.mean(x_pad, axis=1)[:-1],np.sum(x_pad[-1])/(x.size%bin)))

but I think it's cleaner and easier to do it like this

def rebin(x,bin):
    return np.array([a.mean() for a in np.array_split(x,bin)])

But that won't be faster.

edited Dec 06 '16 at 13:24

answered Dec 06 '16 at 13:18

Daniel F

13,620
2
29
55

Divakar · Accepted Answer · 2016-12-06T13:24:29.647

Approach #1 : If you care about memory, it might be better to initialize the output array and then assign values into it in two steps just like in the original code but without appending, like so -

n = x.size//bin
out = np.empty((x.size-1 + bin)//bin) 
out[:n] = x[:bin*n].reshape(-1,bin).mean(1)
out[n:] = x[-x.size+n*bin:].mean()

Approach #2 : Here's another approach with focus on memory efficiency with np.add.reduceat -

out = np.add.reduceat(x, bin*np.arange((x.size-1+bin)//bin)).astype(float)
out[:n] /= bin
out[n:] /= x.size - n*bin

Alternatively, another way to get the grouped summations as done with np.add.reduceat() would be with np.bincount -

np.bincount(np.arange(x.size)//bin,x)

Optimizing a rebin of a numpy array to arbitrary binsize

2 Answers2