Pandas cumsum on a separate column condition

Question

I have an issue using Pandas and cumsum which is not behaving as I was expecting so was wondering if anyone could shed some light on how this works.

I have a dataframe that looks as follows:

|     |price    |flag  |cum_sum  |
|-----|---------|------|---------|
|0    |2        |1     |2        |
|1    |5        |1     |7        |
|2    |8        |1     |15       |
|3    |9        |0     |0        |
|4    |12       |0     |0        |
|5    |2        |1     |17       |

Currently the code looks as follows:

df['cum_sum'] = df.groupby(by=['flag','price']).sum().groupby(level=[1]).cumsum()

I only want it to sum a column where a flag is specified. I feel like this should be simple but i'm missing something fundamental. The dataset is huge so was not looking for any loops or iteration answers.

So you only want have a cummulative sum for the `price` column when `flag` is 1? — pansen, Mar 07 '17 at 22:21
if you wish to sum() the price, did you try "df.groupby('flag')['price'].cumsum()"? your part of query after the sum() seems to be irrelevant to me. — Siraj S., Mar 07 '17 at 22:21

score 1 · Accepted Answer · answered Mar 07 '17 at 22:55

Looks like you have most of what you want. If you want null values (NaN) for 0-flags then do this:

df['cum_sum'] = df[df['flag'] == 1]['price'].cumsum()

flag price cum_sum 0 1 2 2.0 1 1 5 7.0 2 1 8 15.0 3 0 9 NaN 4 0 12 NaN 5 1 2 17.0

score 0 · Answer 2 · answered Mar 07 '17 at 22:29

0

Is this what you want?

In [15]: df.price.mul(df.flag).cumsum().mul(df.flag)
Out[15]:
0     2
1     7
2    15
3     0
4     0
5    17
dtype: int64

answered Mar 07 '17 at 22:29

MaxU - stand with Ukraine

205,989
36
386
419

Pandas cumsum on a separate column condition

2 Answers2

Linked