Refactoring pandas cumulative count of a grouped cumulative sum to numpy

Question

I have the following function which takes in a pandas series and gives me cumulative count of the cumulative sum of a target (0 or 1). See below for what the output should transform into (the inputs will always be a binary sequence). The current usage of this function in my code requires me to loop through it several times; I am refactoring my code to, rather than loop through pandas series, to instead use 3 dimensional numpy array inputs. Another motivation is the performance critical nature, and I believe numpy will provide the speed I need. However, none of my attempted solutions give the correct result - I can't get an equivalent of the groupby to work. Can someone help me refactor this code into numpy?

[0,0,0,1,1,1,0,1] --> [0,0,0,1,2,3,0,1]

import pandas as pd
import numpy as np
def calculate_binary_momentum(series, target):
    if target == 0:
        tmp_target = 1
    else:
        tmp_target = 0
    s = series.groupby((series != tmp_target).cumsum()).cumcount()
    return s

a = np.array([[0,0,0,1,1,1,0,1], [1,1,0,0,1,1,1,0]])
b = pd.Series(a[0, :])
c = calculate_binary_momentum(b, 0)
print(c)

in = np.array([[[0,0,0,1,1,1,0,1], [1,1,0,0,1,1,1,0]] , [[0,1,0,1,0,1,0,1], [0,1,1,1,1,1,1,0]]])
out = calculate_binary_momentum_3d_np(in, 0)
out --> [[[0,0,0,1,2,3,0,1], [1,2,0,0,1,2,3,0]], [[0,1,0,1,0,1,0,1], [0,1,2,3,4,5,6,0]]]

This is different from the following because I am looking for a larger dimensional implementation, and I believe this dimensional generalization in this question is non trivial and therefore merits being reopen. Counting consecutive 1's in NumPy array

Can you explain why the desired output is the correct output based on the input? — jared, Jul 16 '23 at 19:53
@jared Yes, I actually had a mistake and edited it. I'll explain each operation: 1) target=0 so tmp_target=1 2) series != 1 for the first vector gives [T, T, T, F, F, F, T, F] 3) the cumulative sum of this is [1,2,3,3,3,3, 4, 4] 4) when we do series.groupby(cumsum), we get a prettydict: {1: [0], 2: [1], 3: [2, 3, 4, 5], 4: [6, 7]}, which is basically a dictionary of lists with key being cumsum-quantity and the value being the consecutive list of indexes. 5) when we take the cumulative count of this, we get [0,0,0,1,2,3,0,1]. I'll update it with code you can simply copy/paste and run — frankL, Jul 16 '23 at 20:18

Refactoring pandas cumulative count of a grouped cumulative sum to numpy

0 Answers0