Date AccountRef Amount Type
0 2019-04-15 REF0005 25000.00 cashflow_in
1 2019-04-15 REF004 20000.00 cashflow_in
2 2019-06-07 REF005 2000.00 cashflow_in
3 2019-07-24 REF005 3000.00 cashflow_in
4 2019-07-25 REF004 5000.00 cashflow_out
5 2019-08-08 REF004 5000.00 cashflow_in
6 2019-09-10 REF004 5000.00 cashflow_out
7 2019-10-13 REF005 5000.00 cashflow_out
8 2019-10-30 REF006 5000.00 cashflow_in
9 2019-11-08 REF009 2500.00 cashflow_out
10 2019-11-23 REF005 5000.00 cashflow_in
11 2019-11-30 REF011 5000.00 cashflow_out
Above is roughly my data set and I'm been trying to create a new column that sums of "Amount" if "AccountRef" is the same (so yes, there will be duplicate values), if type = cashflow_in, and date is after a date (eg using Date = 25/07/2020) for example).
I'll also be be another another column after this but the calculation method should be similar so I should be able to replace the filters/function again.
I've tried things like
> df = df[df['Type'] == 'cashflow_in']
> df['CumluativeIn'] = df.apply(lambda row: df[df['AccountRef'] == row['AccountRef']]['Amount'].sum(), axis=1)
but it doesn't really work the way it should. It's based on a SumIF command from excel.
Edit as I got advised to do it. My desire output is a value (or more precisely "sum") in each row where it meets those criteria (before a certain date, sum of when all values are the same as in the same row, and when type - cashflow_in).
The second weird criteria but afterwards I'll be adding a second column and varying the dates, sorry about my questions!
Edit 2: To show an example of what I want Date before 2019-11-01
Date AccountRef Amount Type Cumulative_In
0 2019-04-15 REF0005 25000.00 cashflow_in 30000.00
1 2019-04-15 REF004 20000.00 cashflow_in 25000.00
2 2019-06-07 REF005 2000.00 cashflow_in 30000.00
3 2019-07-24 REF005 3000.00 cashflow_in 30000.00
4 2019-07-25 REF004 5000.00 cashflow_out 25000.00
5 2019-08-08 REF004 5000.00 cashflow_in 25000.00
6 2019-09-10 REF004 5000.00 cashflow_out 25000.00
7 2019-10-13 REF005 5000.00 cashflow_out 30000.00
8 2019-10-30 REF006 5000.00 cashflow_in 5000.00
9 2019-11-08 REF009 2500.00 cashflow_out 0.00
10 2019-11-23 REF005 5000.00 cashflow_in 30000.00
11 2019-11-30 REF011 5000.00 cashflow_out 0.00
In the world of Excel (assuming 0 = A2 and F2 is first data point in the new column), this would be New Column = Sumifs{Amount Column,
or more accurately
F2 = Sumifs{$D$2:$D$12,$C$2:$C$12,C2,$B$2:$B$12,< "2019-11-01",$F$2:$F$12 = "cashflow_in"}
Then F3 becomes F3 = Sumifs{$D$2:$D$12,$C$2:$C$12,C3,$B$2:$B$12,< "2019-11-01",$F$2:$F$12 = "cashflow_in"} F4 = Sumifs{$D$2:$D$12,$C$2:$C$12,C4,$B$2:$B$12,< "2019-11-01",$F$2:$F$12 = "cashflow_in"}
to work so here's an image of what I was trying to say. https://i.imgur.com/HKdkiAG.png – Lazar Clarent Jul 26 '20 at 09:33