I have a pandas data frame that I want to group by two columns and then return the cumulative sum of a third column of strings as a list within one of these groups.
Example:
Year Bucket Name
2000 1 A
2001 1 B
2003 1 C
2000 2 B
2002 2 C
The output I want is:
Year Bucket Cum_Sum
2000 1 [A]
2001 1 [A,B]
2002 1 [A,B]
2003 1 [A,B,C]
2000 2 [B]
2001 2 [B]
2002 2 [B,C]
2003 2 [B,C]
I tried to piece together an answer from two responses: https://stackoverflow.com/a/39623235/5143841 https://stackoverflow.com/a/22651188/5143841
But I can't quite get there.