3

Imagine I have an n x d python array, e.g. a=np.array([[1,2,3],[4,5,6], [7,8,9], [10,11,12], [13,14,15]])

so in this case n=5, d=3 and imagine I have some number c which is smaller or equal than n and what I want to calculate is the following:

Consider every column independently and calculate the sum of every c values; e.g. if c=2, the solution would be

solution=np.array([[1+4, 2+5, 3+6], [7+10,8+11,9+12]])

The last row is skipped because 5 mod 2 = 1, so we need to leave out one line in the end;

If c=1, the solution would be the original array and if e.g. c=3 the solution would be

solution=np.array([[1+4+7, 2+5+8, 3+6+9]]), while the last two lines are omitted;

Now what would be the most elegant and efficient solution to do that? I have searched a lot online but could not find a similar problem

Mark
  • 41
  • 4

1 Answers1

4

Here's one way -

def sum_in_blocks(a, c):
    # Get extent of each col for summing
    l = c*(len(a)//c)

    # Reshape to 3D considering first l rows, and "cutting" after each c rows
    # Then sum along second axis
    return a[:l].reshape(-1,c,a.shape[1]).sum(1)

More info on second step - General idea for nd to nd transformation.

Sample runs -

In [79]: a
Out[79]: 
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15]])

In [80]: sum_in_blocks(a, c=1)
Out[80]: 
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15]])

In [81]: sum_in_blocks(a, c=2)
Out[81]: 
array([[ 5,  7,  9],
       [17, 19, 21]])

In [82]: sum_in_blocks(a, c=3)
Out[82]: array([[12, 15, 18]])

Explanation with given sample

In [84]: a
Out[84]: 
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15]])

In [85]: c = 2

In [87]: l = c*(len(a)//c) # = 4; Get extent of each col for summing

In [89]: a[:l] # hence not relevant rows are skipped
Out[89]: 
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

# Reshape to 3D "cutting" after every c=2 rows
In [90]: a[:l].reshape(-1,c,a.shape[1])
Out[90]: 
array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

# Sum along axis=1 for final o/p
In [91]: a[:l].reshape(-1,c,a.shape[1]).sum(axis=1)
Out[91]: 
array([[ 5,  7,  9],
       [17, 19, 21]])
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • I have no idea what you are doing in the second step but I applied it to my data and it works like a charm.. Thanks – Mark Nov 24 '19 at 16:28
  • @Mark Added step-by-step explanation using a sample. That should make it easier to understand. – Divakar Nov 24 '19 at 16:57