Vectorized cummulative sum based on value in array numpy

Question

I'm looking a vectorized way to do a cummulative sum that resets everytime a 0 occurs. For instance say we have an array ar = np.array([0,1,0,1,1,0,1,0]). The output i want is then np.array([0,1,0,1,2,0,1,0]).

I have the following implementations that works but they are not completely vectorized.

Method 1:
s = pd.Series(ar)
data = s.groupby(s.eq(0).cumsum()).cumsum().tolist()

Method 2:
def intervaled_cumsum(ar):
    split = np.array((np.split(ar, np.where(ar<1)[0])))[1:]
    sizes = np.array([len(i) for i in split])
    out = ar.copy()

    arc = ar.cumsum()
    idx = sizes.cumsum()

    out[idx[0]] = ar[idx[0]] - arc[idx[0]-1]
    out[idx[1:-1]] = ar[idx[1:-1]] - np.diff(arc[idx[:-1]-1])

    return out.cumsum()

How might i do this in python using any library really, could be something other than numpy?

shout out to the answers on this thread Multiple cumulative sum within a numpy array

Method 2 looks familiar! So, the little loop `sizes = np.array([len(i) for i in split])` in an otherwise vectorized approach is your bottleneck? — Divakar, Apr 19 '18 at 14:35
@Divakar [Hah, so does method 1!](https://stackoverflow.com/a/48805093/4909087) — cs95, Apr 19 '18 at 14:36
Oh the methods are basically taken from https://stackoverflow.com/questions/49178977/multiple-cumulative-sum-within-a-numpy-array. Sorry for not giving credit where its due! @Divakar — , Apr 19 '18 at 14:38
@hedge I recommend looking at [this answer](https://stackoverflow.com/a/48816598/4909087) for performant methods and alternatives. — cs95, Apr 19 '18 at 14:41
So if youre coming to python from Matlab vectorization doesn't really work the same way in python. in Matlab you need to vectorize almost every single thing to get performance, but sometime in python (even in numpy or similar packages) the vectorization is more for convenience than for performance. — Grant Williams, Apr 19 '18 at 14:42
If you really need performance I'd consider using Cython here. I think its a good problem for it, and you should see large gains from it. — Grant Williams, Apr 19 '18 at 14:43
@Divakar i think you may have linked to the wrong post. it doesnt seem to be a duplicate of this one? maybe you meant https://stackoverflow.com/questions/49178977/multiple-cumulative-sum-within-a-numpy-array — Grant Williams, Apr 19 '18 at 14:50
@GrantWilliams The accepted answer to the linked dup Q&A does it - https://stackoverflow.com/a/44421252/. And as asked in the question is fully vectorized. — Divakar, Apr 19 '18 at 14:51

Vectorized cummulative sum based on value in array numpy

0 Answers0