Fastest way summing condition based part of array

Question

I have an array looking like this:

testdata = [-2, -1, 0, 1, 2, 3, 10, 3, 2, 1, 0, -1, -2]

So it has a maximum value and to the left and right the values gradually go down to zero and it then can have values below 0.

The aim of my code is to find the maximum value of the array and sum all these values left and right to it until the value 0 (including the maximum value).

To test my code I generated a larger array like this (ignoring possible smaller values than 0):

data1 = [x for x in range(0, 100000, 1)]
data2 = [x for x in range(100000, -1, -1)]

data3 = data1 + data2

The fastest code I can come up with looks like this:

j = 1
k = 0

max_index = np.where(data3 == np.amax(data3))[0][0]

while data3[max_index + j] > 0:
    j += 1

while data3[max_index - k] > 0:
    k += 1

summ1 = np.sum(data3[max_index:(max_index+j)]) 
summ2 = np.sum(data3[(max_index-k):max_index])
total = summ1 + summ2

print(total)

Any suggestions on how to boost this even faster?

This looks like a simple pencil - paper problem. If decrement is gradual no loops needed, only a formula of arithmetic progression. — mathfux, Oct 27 '20 at 22:49
It was just for illustration purposes. The values go down to zero gradually but don't follow any arithmetic progression. It's more like random noise declining. Sorry for the confusion. — Jailbone, Oct 27 '20 at 22:58
Assuming there is only one peak and the values are weakly decreasing on both sides, you could just use `np.maximum(data3, 0).sum()`. In other words, replace all negative values with zeros and sum over the entire array. It helps performance if `data3` is a numpy array to begin with. — hilberts_drinking_problem, Oct 28 '20 at 04:07

score 2 · Answer 1 · answered Oct 27 '20 at 23:53

It depends a lot on data. Seems like you're trying to find an efficient way to return the first index of something in an array. Well, there isn't an efficient one in numpy because only iterations of whole array are allowed in numpy but you can use numba instead in order to outperform numpy

In case there you need to sum a large small part of your list, numpy is a good choice:

zero_idx = np.where(data3==0)[0]
max_loc = np.searchsorted(zero_idx, np.argmax(data3))
start, end = zero_idx[max_loc - 1], zero_idx[max_loc]
total_sum = np.sum(data3[start:end])

Otherwise, use pythonic index method (or numba):

k = np.argmax(data3)
left_list = data3[k:].tolist()
right_list = data3[k::-1].tolist()
s1 = np.sum(data3[k: k + left_list.index(0)])
s2 = np.sum(data3[k - right_list.index(0): k])
total_sum = s1 + s2

Benchmarks. I've got first method was 20x faster using %timeit decorator in Jupyter Notebook:

512 µs ± 34.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
10.2 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

score 1 · Accepted Answer · answered Oct 27 '20 at 22:45

You can use masking instead of using loops.

The masks [data3[max_index:] > 0] and [data3[:max_index] > 0] are equivalent to the slicings [max_index:(max_index+j)] and [(max_index-k):max_index], except you don't have to bother finding j and k yourself.

from contextlib import contextmanager
import numpy as np
import time

@contextmanager
def time_this_scope(name):
    """Handy context manager to time a portion of code."""
    t0 = time.perf_counter()
    yield
    print(f"{name} took {time.perf_counter() - t0}s.")

# Preparing the data.
data1 = [x for x in range(0, 100000, 1)]
data2 = [x for x in range(100000, -1, -1)]
data3 = data1 + data2
max_index = np.where(data3 == np.amax(data3))[0][0]
    
# Comparing the performance of both methods.
with time_this_scope("method 1"):
    j = 1
    k = 0
    while data3[max_index + j] > 0:
        j += 1    
    while data3[max_index - k] > 0:
        k += 1
    summ1 = np.sum(data3[max_index:(max_index+j)]) 
    summ2 = np.sum(data3[(max_index-k):max_index])
    total_m1 = summ1 + summ2

with time_this_scope("method 2"):    
    data3 = np.array(data3)
    summ1 = np.sum(data3[max_index:][data3[max_index:] > 0])
    summ2 = np.sum(data3[:max_index][data3[:max_index] > 0])      
    total_m2 = summ1 + summ2
    
# Checking they do yield the same result.
print(total_m1 == total_m2)

>>> method 1 took 0.08157979999998588s.
>>> method 2 took 0.011274500000013177s.
>>> True

Thanks! That looks neat, did not know of that functionality before! I'm wondering, if the array gets larger, would it be sensible to parallelize the summation of the left and right branch using threading/multiprocessing? — Jailbone, Oct 27 '20 at 23:22
@Jailbone It appears that `numba` is possibly a good choice for parallelization of `index` method but it depends a lot on data you expect. Another option is to use `numexpr` package for further improvement of `np.sum` once you find which part of `data3` to sum. — mathfux, Oct 28 '20 at 00:01
Thanks for the great info! I am curious about the numexpr package. Could you maybe provide an example on how to useit in this case? I don't quite understand how it works. — Jailbone, Oct 28 '20 at 00:27
@Jailbone. It is quite easy. For example, a substitute of `a*b+b*c+c*a` for large arrays `a,b,c = (np.random.randint(100, size = 1000000),) * 3` is: `import numexpr as ne;` `ne.evaluate('a*b+b*c+c*a', {'a':a, 'b':b, 'c':c})`. However, I've just noticed it doesn't help to fasten `np.sum` because it's quite fast itself. — mathfux, Oct 28 '20 at 00:34

Fastest way summing condition based part of array

2 Answers2