Sum elements before and replace element with sum

Question

I have the numpy array

arr = np.array([[0, 0, 2, 5, 0, 0, 1, 8, 0, 3, 0],
                [1, 2, 0, 0, 0, 0, 5, 7, 0, 0, 0],
                [8, 5, 3, 9, 0, 1, 0, 0, 0, 0, 1]])

I need the result array like this:

[[0, 0, 0, 0, 7, 0, 0, 0, 9, 0, 3]
 [0, 0, 3, 0, 0, 0, 0, 0, 12, 0, 0]
 [0, 0, 0, 0, 25, 0, 1, 0, 0, 0, 0]]

What's happened?

We go along the row, if element in row is 0, then we go to the next element , if not 0, then we sum up the elements until 0 is met, once 0 is met, then we replace it with the resulting sum (also replace the initial non-zero numbers with 0

I already know how to do that with loops but it doesn't work well on time for a large number of rows, so I need time-efficient solution in numpy methods

I suggest using `np.frompyfunc` to create a custom ufunc that adds or resets to 0, then use `np.ufunc.accumulate`. — Lærne, Jan 19 '23 at 16:36
you forgot to include that you are also replacing the initial non zero numbers with 0 — Craicerjack, Jan 19 '23 at 16:52
@ScottHunter attempts are not necessary on Stack Overflow, and often make the question **worse**. What *is* necessary is proper *analysis* of the problem - breaking it down into steps and then figuring out which step causes a problem. After all, the goal here is to figure out **how to** do something, **not** to identify a specific problem in a specific attempt. This is not a debugging service. — Karl Knechtel, Jan 20 '23 at 01:38

Pranav Hosangadi · Accepted Answer · 2023-01-20T19:12:12.740

First, we want to find the locations where the array has a zero next to a non-zero.

rr, cc = np.where((arr[:, 1:] == 0) & (arr[:, :-1] != 0))

Now, we can use np.add.reduceat to add elements. Unfortunately, reduceat needs a list of 1-d indices, so we're going to have to play with shapes a little. Calculating the equivalent indices of rr, cc in a flattened array is easy:

reduce_indices = rr * arr.shape[1] + cc + 1
# array([ 4,  8, 10, 13, 19, 26, 28])

We want to reduce from the start of every row, so we'll create a row_starts to mix in with the indices calculated above:

row_starts = np.arange(arr.shape[0]) * arr.shape[1]
# array([ 0, 11, 22])

reduce_indices = np.hstack((row_starts, reduce_indices))
reduce_indices.sort()
# array([ 0,  4,  8, 10, 11, 13, 19, 22, 26, 28])

Now, call np.add.reduceat on the flattened input array, reducing at reduce_indices

totals = np.add.reduceat(arr.flatten(), reduce_indices)
# array([ 7,  9,  3,  0,  3, 12,  0, 25,  1,  1])

Now we have the totals, we need to assign them to an array of zeros. Note that the 0th element of totals needs to go to the 1th index of reduce_indices, and the last element of totals is to be discarded:

result_f = np.zeros((arr.size,))
result_f[reduce_indices[1:]] = totals[:-1]
result = result_f.reshape(arr.shape)

Now, one last step remains. For cases where the last element in a row is nonzero, reduceat would calculate a nonzero value for the first element of the next row, as you mentioned in the comment below. An easy solution is to overwrite these to zero.

result[:, 0] = 0

which gives the expected result:

array([[ 0.,  0.,  0.,  0.,  7.,  0.,  0.,  0.,  9.,  0.,  3.],
       [ 0.,  0.,  3.,  0.,  0.,  0.,  0.,  0., 12.,  0.,  0.],
       [ 0.,  0.,  0.,  0., 25.,  0.,  1.,  0.,  0.,  0.,  0.]])

Good, but for `arr = np.array([[0, 0, 2, 5, 0, 0, 1, 8, 0, 0, 0], [8, 5, 3, 9, 0, 1, 0, 0, 0, 1, 1], [1, 2, 0, 0, 0, 0, 5, 7, 0, 0, 0]])` **it fails**. Perhaps at the end of your solution we should add `result[:, 0] = 0`? — qtsar, Jan 20 '23 at 11:34
That is a good point, @qtsar! I had a hunch it would fail when the last element of a row isn't zero but I forgot to test for that case. Your solution is a great easy fix for this issue! — Pranav Hosangadi, Jan 20 '23 at 14:54

Polatkan Polat · Answer 2 · 2023-01-20T01:33:19.810

We can solve using 2 for loops. In every row we will define current_sum and if number is zero we assign current_sum to number and reset current_sum; if number is not zero we assign 0 to number and we increment current_sum.

Edit: Sorry first i didn't realize you want an efficient solution. We can use numba to accelerate for loops. it is really simple and powerful. Here is the code:

import numpy as np
import numba
arr = np.array([[0, 0, 2, 5, 0, 0, 1, 8, 0, 3, 0],
                [1, 2, 0, 0, 0, 0, 5, 7, 0, 0, 0],
                [8, 5, 3, 9, 0, 1, 0, 0, 0, 0, 1]])

@numba.jit(nopython=True)
def mySum(array):
    for i in range(array.shape[0]):
        current_sum = 0
        for j in range(array.shape[1]):
            if array[i,j] == 0:
                array[i,j] = current_sum
                current_sum = 0
            else:
                current_sum += array[i,j]
                array[i,j] = 0
    return array

print(mySum(arr))

function is slow in first run because it understands input and function and creates machine code, but after that it is really fast. I hope it is fast enough for your case.

As far as possible, you shouldn't use loops for a numpy array. More often than not, you'll find numpy indexing and operations can do the job, and much more efficiently than looping — Pranav Hosangadi, Jan 20 '23 at 00:55
Oh my bad. I thought op just need a solution, but he wants a time-efficient solution. Yes for loops are bad, but using libraries like numba could really help sometimes. Also every row can be calculated separately so paralel processing may help. — Polatkan Polat, Jan 20 '23 at 01:13

score -1 · Answer 3 · answered Jan 20 '23 at 01:43

Maybe longer than in loop... But let me demonstrate with single array:

a = np.array([0, 0, 2, 5, 0, 0, 1, 8, 0, 3, 0])
zero_index = np.where(a == 0)[0]
# Split zeros, sum each slice, drop the last one
replace_arr = np.array(list(map(sum, np.split(a, zero_index))))[:-1]
output = np.zeros(11)
# Put sum data into zeros array
np.put_along_axis(output, zero_index, replace_arr, axis=0)
output

Sum elements before and replace element with sum

3 Answers3