0

I have an issue using Pandas and cumsum which is not behaving as I was expecting so was wondering if anyone could shed some light on how this works.

I have a dataframe that looks as follows:

|     |price    |flag  |cum_sum  |
|-----|---------|------|---------|
|0    |2        |1     |2        |
|1    |5        |1     |7        |
|2    |8        |1     |15       |
|3    |9        |0     |0        |
|4    |12       |0     |0        |
|5    |2        |1     |17       |

Currently the code looks as follows:

df['cum_sum'] = df.groupby(by=['flag','price']).sum().groupby(level=[1]).cumsum()

I only want it to sum a column where a flag is specified. I feel like this should be simple but i'm missing something fundamental. The dataset is huge so was not looking for any loops or iteration answers.

Psidom
  • 209,562
  • 33
  • 339
  • 356
robmsmt
  • 1,389
  • 11
  • 19

2 Answers2

1

Looks like you have most of what you want. If you want null values (NaN) for 0-flags then do this:

df['cum_sum'] = df[df['flag'] == 1]['price'].cumsum()

flag price cum_sum 0 1 2 2.0 1 1 5 7.0 2 1 8 15.0 3 0 9 NaN 4 0 12 NaN 5 1 2 17.0

0

Is this what you want?

In [15]: df.price.mul(df.flag).cumsum().mul(df.flag)
Out[15]:
0     2
1     7
2    15
3     0
4     0
5    17
dtype: int64
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419