0

I'm looking a vectorized way to do a cummulative sum that resets everytime a 0 occurs. For instance say we have an array ar = np.array([0,1,0,1,1,0,1,0]). The output i want is then np.array([0,1,0,1,2,0,1,0]).

I have the following implementations that works but they are not completely vectorized.

Method 1:
s = pd.Series(ar)
data = s.groupby(s.eq(0).cumsum()).cumsum().tolist()

Method 2:
def intervaled_cumsum(ar):
    split = np.array((np.split(ar, np.where(ar<1)[0])))[1:]
    sizes = np.array([len(i) for i in split])
    out = ar.copy()

    arc = ar.cumsum()
    idx = sizes.cumsum()

    out[idx[0]] = ar[idx[0]] - arc[idx[0]-1]
    out[idx[1:-1]] = ar[idx[1:-1]] - np.diff(arc[idx[:-1]-1])

    return out.cumsum()

How might i do this in python using any library really, could be something other than numpy?

shout out to the answers on this thread Multiple cumulative sum within a numpy array

Sotos
  • 51,121
  • 6
  • 32
  • 66
  • 2
    Method 2 looks familiar! So, the little loop `sizes = np.array([len(i) for i in split])` in an otherwise vectorized approach is your bottleneck? – Divakar Apr 19 '18 at 14:35
  • 1
    @Divakar [Hah, so does method 1!](https://stackoverflow.com/a/48805093/4909087) – cs95 Apr 19 '18 at 14:36
  • Oh the methods are basically taken from https://stackoverflow.com/questions/49178977/multiple-cumulative-sum-within-a-numpy-array. Sorry for not giving credit where its due! @Divakar –  Apr 19 '18 at 14:38
  • @hedge I recommend looking at [this answer](https://stackoverflow.com/a/48816598/4909087) for performant methods and alternatives. – cs95 Apr 19 '18 at 14:41
  • So if youre coming to python from Matlab vectorization doesn't really work the same way in python. in Matlab you need to vectorize almost every single thing to get performance, but sometime in python (even in numpy or similar packages) the vectorization is more for convenience than for performance. – Grant Williams Apr 19 '18 at 14:42
  • If you really need performance I'd consider using Cython here. I think its a good problem for it, and you should see large gains from it. – Grant Williams Apr 19 '18 at 14:43
  • @Divakar i think you may have linked to the wrong post. it doesnt seem to be a duplicate of this one? maybe you meant https://stackoverflow.com/questions/49178977/multiple-cumulative-sum-within-a-numpy-array – Grant Williams Apr 19 '18 at 14:50
  • @GrantWilliams The accepted answer to the linked dup Q&A does it - https://stackoverflow.com/a/44421252/. And as asked in the question is fully vectorized. – Divakar Apr 19 '18 at 14:51

0 Answers0