14

Say I have an array of distances x=[1,2,1,3,3,2,1,5,1,1].

I want to get the indices from x where cumsum reaches 10, in this case, idx=[4,9].

So the cumsum restarts after the condition are met.

I can do it with a loop, but loops are slow for large arrays and I was wondering if I could do it in a vectorized way.

Rahul Agarwal
  • 4,034
  • 7
  • 27
  • 51
user3194861
  • 314
  • 4
  • 15
  • Hard to achieve by the vectorized method – BENY Jul 05 '19 at 14:02
  • hmmm looks like it @WeNYoBen – yatu Jul 05 '19 at 14:02
  • 1
    @yatu I think I answer those type question before :-) ...Better solution maybe using numba .. – BENY Jul 05 '19 at 14:04
  • 3
    If you want to reset `cumsum` whenever 10 is reached, something like this could work: `x.cumsum()%10`. If `x` is an array this is really fast. But also this may not work for you, because how you want to handle edge cases is unclear. Like, if `cumsum` is 11, should it reset to 0 or 1? – Brenlla Jul 05 '19 at 14:29
  • No @brenlla that doesn't quite work as it doesn't take into account the remainder of the previous value that may have surpassed 10, try for instance with `[1,2,1,3,6,2,1,5,1,1]` – yatu Jul 05 '19 at 14:32
  • It requires a loop regardless. But for speed you want to loop in fast compiled code. For inherently sequential tasks like this the stock tools of numpy are more limited. – hpaulj Jul 05 '19 at 14:51
  • It's not clear to me what is actually being asked, i.e. why does @Brenlla's suggestion not provide a route to a good, fast answer? – T_M Jun 18 '20 at 09:09

3 Answers3

13

A fun method

sumlm = np.frompyfunc(lambda a,b:a+b if a < 10 else b,2,1)
newx=sumlm.accumulate(x, dtype=np.object)
newx
array([1, 3, 4, 7, 10, 2, 3, 8, 9, 10], dtype=object)
np.nonzero(newx==10)

(array([4, 9]),)
BENY
  • 317,841
  • 20
  • 164
  • 234
10

Loops are not always bad (especially when you need one). Also, There is no tool or algorithm that will make this quicker than O(n). So let's just make a good loop.

Generator Function

def cumsum_breach(x, target):
    total = 0
    for i, y in enumerate(x):
        total += y
        if total >= target:
            yield i
            total = 0

list(cumsum_breach(x, 10))

[4, 9]

Just In Time compiling with Numba

Numba is a third party library that needs to be installed.
Numba can be persnickety about what features are supported. But this works.
Also, as pointed out by Divakar, Numba performs better with arrays

from numba import njit

@njit
def cumsum_breach_numba(x, target):
    total = 0
    result = []
    for i, y in enumerate(x):
        total += y
        if total >= target:
            result.append(i)
            total = 0

    return result

cumsum_breach_numba(x, 10)

Testing the Two

Because I felt like it ¯\_(ツ)_/¯

Setup

np.random.seed([3, 1415])
x0 = np.random.randint(100, size=1_000_000)
x1 = x0.tolist()

Accuracy

i0 = cumsum_breach_numba(x0, 200_000)
i1 = list(cumsum_breach(x1, 200_000))

assert i0 == i1

Time

%timeit cumsum_breach_numba(x0, 200_000)
%timeit list(cumsum_breach(x1, 200_000))

582 µs ± 40.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
64.3 ms ± 5.66 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Numba was on the order of 100 times faster.

For a more true apples to apples test, I convert a list to a Numpy array

%timeit cumsum_breach_numba(np.array(x1), 200_000)
%timeit list(cumsum_breach(x1, 200_000))

43.1 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
62.8 ms ± 327 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Which brings them to about even.

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • This should be fast enough. In the case of non-negative array, we can make it faster with break when `total > 10`. – Quang Hoang Jul 05 '19 at 14:11
  • May speed up by numba :-) BTW happy 7-4 – BENY Jul 05 '19 at 14:11
  • @piRSquared that is great sir :-) – BENY Jul 05 '19 at 14:23
  • Hi Sir , could you please testing the speed , would like to know how bad frompyfunc on timing compare with numba . – BENY Jul 05 '19 at 14:27
  • @piRSquared Convert to array and then feed it to numba. See the magic! – Divakar Jul 05 '19 at 14:58
  • @Divakar ahh yes. Thank you. Interestingly, The generator version does better with a `list` and Numba better with the `array` – piRSquared Jul 05 '19 at 15:01
  • @piRSquared The cost of converting to array is not huge. So, it's not bad when summed up. – Divakar Jul 05 '19 at 15:02
  • @Divakar pls look into this question. I tried to solve by referring above solution. But I could not... https://stackoverflow.com/questions/61346334/groupby-cumulative-in-pandas-then-update-using-numpy-based-specific-condition – Danish Apr 22 '20 at 07:09
10

Here's one with numba and array-initialization -

from numba import njit

@njit
def cumsum_breach_numba2(x, target, result):
    total = 0
    iterID = 0
    for i,x_i in enumerate(x):
        total += x_i
        if total >= target:
            result[iterID] = i
            iterID += 1
            total = 0
    return iterID

def cumsum_breach_array_init(x, target):
    x = np.asarray(x)
    result = np.empty(len(x),dtype=np.uint64)
    idx = cumsum_breach_numba2(x, target, result)
    return result[:idx]

Timings

Including @piRSquared's solutions and using the benchmarking setup from the same post -

In [58]: np.random.seed([3, 1415])
    ...: x = np.random.randint(100, size=1000000).tolist()

# @piRSquared soln1
In [59]: %timeit list(cumsum_breach(x, 10))
10 loops, best of 3: 73.2 ms per loop

# @piRSquared soln2
In [60]: %timeit cumsum_breach_numba(np.asarray(x), 10)
10 loops, best of 3: 69.2 ms per loop

# From this post
In [61]: %timeit cumsum_breach_array_init(x, 10)
10 loops, best of 3: 39.1 ms per loop

Numba : Appending vs. array-initialization

For a closer look at how the array-initialization helps, which seems be the big difference between the two numba implementations, let's time these on the array data, as the array data creation was in itself heavy on runtime and they both depend on it -

In [62]: x = np.array(x)

In [63]: %timeit cumsum_breach_numba(x, 10)# with appending
10 loops, best of 3: 31.5 ms per loop

In [64]: %timeit cumsum_breach_array_init(x, 10)
1000 loops, best of 3: 1.8 ms per loop

To force the output to have it own memory space, we can make a copy. Won't change the things in a big way though -

In [65]: %timeit cumsum_breach_array_init(x, 10).copy()
100 loops, best of 3: 2.67 ms per loop
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • It probably won't make much of a difference but I think to be entirely fair you should return a copy of `result[:idx]` to avoid leaking the memory of `result[idx:]`. – Paul Panzer Jul 05 '19 at 15:36
  • Great answer! Not surprised but just saying – piRSquared Jul 06 '19 at 21:42
  • @piRSquared Surely motivated by yours. Think such a array-initialization based one could be used for numba solutions that need to otherwise use appending when the output array size isn't known before-hand. And also, array data works better with numba. So, these two are the learnings from this Q&A. – Divakar Jul 06 '19 at 21:45
  • Yeah great technique. I’ll be using it – piRSquared Jul 06 '19 at 21:46