I have a time sequential grouped table in Pandas DF. I am trying to create a running sum within groups, conditional upon running sum can not be negative, i.e. column cell value resets to zero when running sum turns negative, and continue running sum calculation to preserve integrity and data quality.
I've used the cumsum() with groupby() functions, but can not seem to reset cumulative values to zero on the fly. I've also looked into below thread, but that related to slightly different problem:
Pandas dataframe - running sum with reset
Below is the code (with sample data that I've been working with):
group = ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C']
values = [-5, 4, 2, -4, -2, -2, 5, -2, 3, 6, 7, -11]
df = pd.DataFrame(list(zip(group, values)), columns =['group', 'val'])
df['cumsum_output'] = df.groupby('group')['val'].cumsum()
df['expected_out'] = [0, 4, 6, 2, 0, 0, 5, 3, 3, 9, 16, 5]
group val cumsum_output expected_out
0 A -5 -5 0
1 A 4 -1 4
2 A 2 1 6
3 A -4 -3 2
4 A -2 -5 0
5 B -2 -2 0
6 B 5 3 5
7 B -2 1 3
8 C 3 3 3
9 C 6 9 9
10 C 7 16 16
11 C -11 5 5
['cumsum_output' is what I get from my current code; and 'expected_out' is what ideally I'd like to achieve.] Appreciate any help and suggestion from anyone with better ideas and suggestions in resolving this :-)
Cheers,
Nad