pandas DataFrame: Calculate Sum based on boolean values in another column

Question

I am fairly new to Python and I trying to simulate the following logic with in pandas

I am currently looping throw the rows and want to sum the values in the AMOUNT column in the prior rows but only till the last seen 'TRUE' value. It seems inefficient with the actual data (I have a dataframe of about 5 million rows)? Was wondering what the efficient way of handling such a logic in Python would entail?

Logic: The logic is that if FLAG is TRUE I want to sum the values in the AMOUNT column in the prior rows but only till the last seen 'TRUE' value. Basically sum the values in 'AMOUNT' between the rows where FLAG is TRUE

Please don't post screenshots. See this link: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples Also, what is "such a logic"? What specifically is the logic or an example of the logic? I.E. what is the condition, that made you sum those specific four rows? Is it take the sum rows of True up to the previous row of the next True? — David Erickson, Jun 06 '20 at 21:27
Yes, sorry. The logic is that if FLAG is TRUE I want to sum the values in the AMOUNT column in the prior rows but only till the last seen 'TRUE' value. Basically sum the values in 'AMOUNT' between the rows where FLAG is TRUE — Nandini, Jun 06 '20 at 21:34

score 4 · Accepted Answer · answered Jun 06 '20 at 21:27

4

Check with cumsum and transform sum

df['SUM']=df.groupby(df['FLAG'].cumsum()).Amount.transform('sum').where(df.FLAG)

answered Jun 06 '20 at 21:27

BENY

317,841
20
164
234

Ark Lomas · Answer 2 · 2020-06-06T21:39:59.230

1

maybe try something around the following:

import pandas

df = pd.read_csv('name of file.csv')

df['AMOUNT'].sum()

edited Jun 06 '20 at 21:39

answered Jun 06 '20 at 21:34

Ark Lomas

153
1
12

pandas DataFrame: Calculate Sum based on boolean values in another column

2 Answers2