Trying to create a new column that sums multiple criteria in python

Question

          Date AccountRef      Amount       Type
0   2019-04-15   REF0005  25000.00   cashflow_in
1   2019-04-15   REF004  20000.00   cashflow_in
2   2019-06-07   REF005   2000.00   cashflow_in
3   2019-07-24   REF005   3000.00   cashflow_in
4   2019-07-25   REF004   5000.00   cashflow_out
5   2019-08-08   REF004   5000.00   cashflow_in
6   2019-09-10   REF004   5000.00   cashflow_out
7   2019-10-13   REF005   5000.00   cashflow_out
8   2019-10-30   REF006   5000.00   cashflow_in
9   2019-11-08   REF009   2500.00   cashflow_out
10  2019-11-23   REF005   5000.00   cashflow_in
11  2019-11-30   REF011   5000.00   cashflow_out

Above is roughly my data set and I'm been trying to create a new column that sums of "Amount" if "AccountRef" is the same (so yes, there will be duplicate values), if type = cashflow_in, and date is after a date (eg using Date = 25/07/2020) for example).

I'll also be be another another column after this but the calculation method should be similar so I should be able to replace the filters/function again.

I've tried things like

> df = df[df['Type'] == 'cashflow_in'] 
> df['CumluativeIn'] = df.apply(lambda row: df[df['AccountRef'] == row['AccountRef']]['Amount'].sum(), axis=1)

but it doesn't really work the way it should. It's based on a SumIF command from excel.

Edit as I got advised to do it. My desire output is a value (or more precisely "sum") in each row where it meets those criteria (before a certain date, sum of when all values are the same as in the same row, and when type - cashflow_in).

The second weird criteria but afterwards I'll be adding a second column and varying the dates, sorry about my questions!

Edit 2: To show an example of what I want Date before 2019-11-01

          Date AccountRef      Amount       Type Cumulative_In
0   2019-04-15   REF0005  25000.00   cashflow_in 30000.00
1   2019-04-15   REF004  20000.00   cashflow_in 25000.00
2   2019-06-07   REF005   2000.00   cashflow_in 30000.00
3   2019-07-24   REF005   3000.00   cashflow_in 30000.00
4   2019-07-25   REF004   5000.00   cashflow_out 25000.00
5   2019-08-08   REF004   5000.00   cashflow_in 25000.00
6   2019-09-10   REF004   5000.00   cashflow_out 25000.00
7   2019-10-13   REF005   5000.00   cashflow_out 30000.00
8   2019-10-30   REF006   5000.00   cashflow_in 5000.00
9   2019-11-08   REF009   2500.00   cashflow_out 0.00
10  2019-11-23   REF005   5000.00   cashflow_in 30000.00
11  2019-11-30   REF011   5000.00   cashflow_out 0.00

In the world of Excel (assuming 0 = A2 and F2 is first data point in the new column), this would be New Column = Sumifs{Amount Column,

or more accurately

F2 = Sumifs{$D$2:$D$12,$C$2:$C$12,C2,$B$2:$B$12,< "2019-11-01",$F$2:$F$12 = "cashflow_in"}

Then F3 becomes F3 = Sumifs{$D$2:$D$12,$C$2:$C$12,C3,$B$2:$B$12,< "2019-11-01",$F$2:$F$12 = "cashflow_in"} F4 = Sumifs{$D$2:$D$12,$C$2:$C$12,C4,$B$2:$B$12,< "2019-11-01",$F$2:$F$12 = "cashflow_in"}

Check out the groupby function in the pandas documentation. Also - please provide a sample output so we know what the goal is. — Mike67, Jul 25 '20 at 21:22
Ah sorry my bad, I apoligise as I am still a bit new at this. I'll have a look a groupby and see if I can get it to work and write a comment about it — Lazar Clarent, Jul 25 '20 at 21:54
Does this answer your question? [How to pivot a dataframe](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe) — RichieV, Jul 26 '20 at 04:14
'Is 'REF0005' correct?' Why does 'REF0005' become 30,000 and why does 'REF004' also become 25,000? — r-beginners, Jul 26 '20 at 09:03
It adds 25000.00 from rows 2 2019-06-07 REF005 2000.00 cashflow_in 30000.00 3 2019-07-24 REF005 3000.00 cashflow_in 30000.00 but not (due to "cashflow_out") 7 2019-10-13 REF005 5000.00 cashflow_out 30000.00 and (due to date) 10 2019-11-23 REF005 5000.00 cashflow_in 30000.00 — Lazar Clarent, Jul 26 '20 at 09:10
I had trouble with trying to get two spaces to
to work so here's an image of what I was trying to say. https://i.imgur.com/HKdkiAG.png — Lazar Clarent, Jul 26 '20 at 09:33

score 0 · Answer 1 · answered Jul 26 '20 at 04:10

0

I may not understand the intent of your question. If you want to calculate the totals by account, you can achieve this with the following code. Does this meet your intent?

df['CumluativeIn'] = df.groupby('AccountRef')['Amount'].cumsum()

df
Date    AccountRef  Amount  Type    CumluativeIn
0   2019-04-15  REF0005 25000.0 cashflow_in 25000.0
1   2019-04-15  REF004  20000.0 cashflow_in 20000.0
2   2019-06-07  REF005  2000.0  cashflow_in 2000.0
3   2019-07-24  REF005  3000.0  cashflow_in 5000.0
5   2019-08-08  REF004  5000.0  cashflow_in 25000.0
7   2019-10-13  REF005  5000.0  cashflow_in 10000.0
8   2019-10-30  REF006  5000.0  cashflow_in 5000.0
10  2019-10-23  REF007  5000.0  cashflow_in 5000.0

answered Jul 26 '20 at 04:10

r-beginners

31,170
3
14
32

First of all, thank you very much for your reply. I think that's part of it but I need it to do more than totals by account. I need it to also have a filter where it only sums total by account if the date in column 1 is before a certain date and where Type = cashflow_in. I'm struggling for multiple criteria selection. – Lazar Clarent Jul 26 '20 at 07:55
I'm sorry we can't meet your request. I think you could get a quicker answer if you could better organize and clarify the conditions and the desired output. – r-beginners Jul 26 '20 at 08:02
No, you guys have been more than helpful for me. It is my fault for not being that clear. I tried to edit to clarify my desired output (I tried to post in this comment but it is too long). – Lazar Clarent Jul 26 '20 at 08:49

score 0 · Answer 2 · answered Jul 27 '20 at 14:12

After much fiddling, talking to my co workers, and such - I think I got it to work so I thought I'll post the answer here.

df_grouped = df.groupby('AccountRef')

facilities = df_grouped.groups.keys()
for facility in facilities:
    df_filtered = df_grouped.get_group(facility)
    df_filtered['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
    df_filtered = df_filtered.sort_values(by='Date')
    df_repayments = df_filtered[df_filtered['Type'] == 'cashflow_in']
    df_drawdowns = df_filtered[df_filtered['Type'] == 'cashflow_out']
    total_cashflow_in = df_cashflow_in['Amount'].sum()
    print(total_cashflow_in)
    df_drawdowns.reset_index(drop=True, inplace=True)
    for index, row in df_cashflow_out.iterrows():
        cashflow_out = row['Amount']
        current_value = cashflow_out - total_cashflow_in
        if current_value < 0:
            current_value = 0
            total_cashflow_in = total_cashflow_in - cashflow_out
        else:
            total_cashflow_in = 0
        df_cashflow_out.loc[index, 'current_value'] = current_value

print(df)

Trying to create a new column that sums multiple criteria in python

2 Answers2