I am interested in doing partial sums for the following dataframe:
ID Name A B
1 111 foo 248 123
2 222 bar 331 94
3 111 foo 266 102
4 111 foo 217 163
5 222 bar 194 102
6 222 bar 188 89
I could use groupby
with either sum
or agg
, such as:
df = df.groupby(["ID", "Name"]).agg(sum).reset_index()
which yields:
ID Name A B
1 111 foo 731 388
2 222 bar 713 285
However, I would like to combine indices only until column A passes some pre-specified value, and then begin a second grouping. When that one passes the pre-specified value, begin a third grouping, and so on. For example, if the threshold is set at 500, the code would yield:
ID Name A B
1 111 foo 514 225
2 222 bar 525 196
3 111 foo 217 163
4 222 bar 188 89
Rows 1 and 3 in the original df were grouped. Rows 2 and 4 were grouped. Row 5 does not group with rows 1 and 3 because the threshold of 500 has been passed. Row 6 was similarly left ungrouped.
Order of rows does not matter. Which rows are combined with which other rows does not matter. I just need the ability to group column values with a threshold. I'm stumped, especially when trying to figure out a Pythonic solution as opposed to iterating through the dataframe row by row and explicitly evaluating each row. Any feedback would be much appreciated.