Sums of subarrays

Question

I have a 2d array of integers and I want to sum up 2d sub arrays of it. Both arrays can have arbitrary dimensions, although we can assume that the subarray will be orders of magnitudes smaller than the total array.

The reference implementation in python is trivial:

def sub_sums(arr, l, m):
    result = np.zeros((len(arr) // l, len(arr[0]) // m))
    rows = len(arr) // l * l
    cols = len(arr[0]) // m * m
    for i in range(rows):
        for j in range(cols):
            result[i // l, j // m] += arr[i, j]
    return result

The question is how I do this best using numpy, hopefully without any looping in python at all. For 1d arrays cumsum and r_ would work and I could use that with a bit of looping to implement a solution for 2d, but I'm still learning numpy and I'm almost certain there's some cleverer way.

Example output:

arr = np.asarray([range(0, 5),
                  range(4, 9),
                  range(8, 13),
                  range(12, 17)])
result = sub_sums(arr, 2, 2)

gives:

[[ 0  1  2  3  4]
 [ 4  5  6  7  8]
 [ 8  9 10 11 12]
 [12 13 14 15 16]]

[[ 10.  18.]
 [ 42.  50.]]

I don't how do you get that result. At first I thought you wanted `np.sum(array, axis=1)` to sum rows, or something similar to sum columns, but I don't get how you can obtain that 2x2 matrix as result. The loops you have seems like could be done using slicing (like `array[::l]` or something like that). — Bakuriu, Jan 19 '14 at 18:48
@Bakuriu [0,0] is the sum of the first 2x2 subarray [or more general the first lxm array) (basically `arr[0:2,0:2].sum()`) and so on. — Voo, Jan 19 '14 at 18:50
In that case I believe that doing the slicing and using arrays sums should be faster, especially for big "sub-arrays". Numpy is notoriously *slow* when doing single element computations (and for *slow* I means *slower* than pure python, due to the conversions `numpy` has to do). — Bakuriu, Jan 19 '14 at 18:54
@Bakuriu Certainly, the question is *how* to do exactly that. — Voo, Jan 19 '14 at 18:59

score 4 · Accepted Answer · edited May 23 '17 at 12:15

There is a blockshaped function which does something rather close to what you want:

In [81]: arr
Out[81]: 
array([[ 0,  1,  2,  3,  4],
       [ 4,  5,  6,  7,  8],
       [ 8,  9, 10, 11, 12],
       [12, 13, 14, 15, 16]])

In [82]: blockshaped(arr[:,:4], 2,2)
Out[82]: 
array([[[ 0,  1],
        [ 4,  5]],

       [[ 2,  3],
        [ 6,  7]],

       [[ 8,  9],
        [12, 13]],

       [[10, 11],
        [14, 15]]])

In [83]: blockshaped(arr[:,:4], 2,2).shape
Out[83]: (4, 2, 2)

Once you have the "blockshaped" array, you can obtain the desired result by reshaping (so the numbers in one block are strung out along a single axis) and then calling the sum method on that axis.

So, with a slight modification of the blockshaped function, you can define sub_sums like this:

import numpy as np

def sub_sums(arr, nrows, ncols):
    h, w = arr.shape
    h = (h // nrows)*nrows
    w = (w // ncols)*ncols
    arr = arr[:h,:w]
    return (arr.reshape(h // nrows, nrows, -1, ncols)
               .swapaxes(1, 2)
               .reshape(h // nrows, w // ncols, -1).sum(axis=-1))

arr = np.asarray([range(0, 5),
                  range(4, 9),
                  range(8, 13),
                  range(12, 17)])

print(sub_sums(arr, 2, 2))

yields

[[10 18]
 [42 50]]

Edit: Ophion provides a nice improvement -- use np.einsum instead of reshaping before summing:

def sub_sums_ophion(arr, nrows, ncols):
    h, w = arr.shape
    h = (h // nrows)*nrows
    w = (w // ncols)*ncols
    arr = arr[:h,:w]
    return np.einsum('ijkl->ik', arr.reshape(h // nrows, nrows, -1, ncols))

In [105]: %timeit sub_sums(arr, 2, 2)
10000 loops, best of 3: 112 µs per loop

In [106]: %timeit sub_sums_ophion(arr, 2, 2)
10000 loops, best of 3: 76.2 µs per loop

You should be able to change the return statement to `np.einsum('ijkl->ik',arr.reshape(h // l, l, -1, m))`. Should be much faster as the second reshape would trigger a copy of the array. Reminded me of [this](http://stackoverflow.com/questions/18645013/windowed-maximum-in-numpy/18645174#1864517) question while back. — Daniel, Jan 19 '14 at 19:20
Amazing what you can do with reshaping and swapaxis, really have to get more used to this. Exactly what I was looking for. — Voo, Jan 19 '14 at 19:23

score 2 · Answer 2 · answered Jan 16 '16 at 16:48

Here is the simpler way:

In [160]: import numpy as np

In [161]: arr = np.asarray([range(0, 5),   
                  range(4, 9),
                  range(8, 13),
                  range(12, 17)])

In [162]: np.add.reduceat(arr, [0], axis=1)
Out[162]: 
array([[10],
       [30],
       [50],
       [70]])

In [163]: arr
Out[163]: 
array([[ 0,  1,  2,  3,  4],
       [ 4,  5,  6,  7,  8],
       [ 8,  9, 10, 11, 12],
       [12, 13, 14, 15, 16]])

In [164]: import numpy as np

In [165]: arr = np.asarray([range(0, 5),
                            range(4, 9),
                            range(8, 13),
                            range(12, 17)])

In [166]: arr
Out[166]: 
array([[ 0,  1,  2,  3,  4],
       [ 4,  5,  6,  7,  8],
       [ 8,  9, 10, 11, 12],
       [12, 13, 14, 15, 16]])

In [167]: np.add.reduceat(arr, [0], axis=1)
Out[167]: 
array([[10],
       [30],
       [50],
       [70]])

score 1 · Answer 3 · answered Jan 19 '14 at 19:05

A very small change in your code is to use slicing and perform the sums of the sub-arrays using the sum() method:

def sub_sums(arr, l, m):
    result = np.zeros((len(arr) // l, len(arr[0]) // m))
    rows = len(arr) // l * l
    cols = len(arr[0]) // m * m
    for i in range(len(arr) // l):
        for j in range(len(arr[0]) // m):
            result[i, j] = arr[i*m:(i+1)*m, j*l:(j+1)*l].sum()
    return result

Doing some very simple benchmarks shows that this is slower in the 2x2 case, about equal to your approach in the 3x3 case and faster for bigger sub-arrays (sub_sums2 is your version of the code):

In [19]: arr = np.asarray([range(100)] * 100)

In [20]: %timeit sub_sums(arr, 2, 2)
10 loops, best of 3: 21.8 ms per loop

In [21]: %timeit sub_sums2(arr, 2, 2)
100 loops, best of 3: 9.56 ms per loop

In [22]: %timeit sub_sums(arr, 3, 3)
100 loops, best of 3: 9.58 ms per loop

In [23]: %timeit sub_sums2(arr, 3, 3)
100 loops, best of 3: 9.36 ms per loop

In [24]: %timeit sub_sums(arr, 4, 4)
100 loops, best of 3: 5.58 ms per loop

In [25]: %timeit sub_sums2(arr, 4, 4)
100 loops, best of 3: 9.56 ms per loop

In [26]: %timeit sub_sums(arr, 10, 10)
1000 loops, best of 3: 939 us per loop

In [27]: %timeit sub_sums2(arr, 10, 10)
100 loops, best of 3: 9.48 ms per loop

Notice that with 10x10 sub-arrays it's 1000 times faster. In the 2x2 case it's about twice as slow. Your method basically takes always the same time, while my implementation gets faster with bigger sub-arrays.

I'm pretty sure we can avoid using the for loops explicitly (maybe reshaping the array so that it has the sub-arrays as rows?), but I'm not an expert in numpy and it may take some time before I'll be able to find the final solution. However I believe that 3 orders of magnitude are already a nice improvement.

Sums of subarrays

3 Answers3