0

I have a Pandas Dataframe with one column called [A] and different values:

[A]
1
1
4
5
6
7
5
4
1
1
1

I would like to create another column -called B- and sum the values of A when A is > 2. If this condition is not true the result should be 0.

Finally the result of my example should be:

[B]
0
0
4
9
15
22
27
31
0
0
0

Doing this in Excel is pretty straightforward because you can sum the value of [A] to the previous [B] value. But I don’t know how to do it with Python. I have tried:

DF['B'] = np.where(DF['A'] > 2, DF['A'] + DF['B'].shift(-1), 0)

But it does not work.

Any help would be greatly appreciated.

Thanks!

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Martingale
  • 511
  • 1
  • 6
  • 15
  • Also note that there is a similar question: [Cumsum reset at NaN](https://stackoverflow.com/questions/18196811/cumsum-reset-at-nan) – Snow bunting Apr 08 '20 at 11:02

1 Answers1

1

This method is called "cumulative sum" and is implemented in pandas as .cumsum() (here is the documentation).

For the >2 condition there are multiple options, and I'm sure there is a more elegant way, but my choice is to first use .clip(lower=2) (doc) to replace all values <2 with 2 and then .replace(2,0) (doc) to replace all 2s with 0. (You can use only a single .replace([1,2], 0) if you have only non-negative integers)

Working example:

import pandas as pd
x = pd.DataFrame({'A': [1,1,4,5,6,7,5,4,1,1,1]})
x['B'] = x.loc[:,'A'].clip(lower=2).replace(2, 0).cumsum()
print(x)

output values: B: 0, 0, 4, 9, 15, 22, 27, 31, 31, 31, 31

Edit (see comments):

This is a hack to reset the cumsum whenever there is a value <2:

x['C'] = 0
for i in range(1, x.shape[0]):
    if x.loc[i, 'A'] > 2:
        z = x.loc[i-1, 'C'] if i else 0    # fix for i==0
        x.loc[i, 'C'] = z + x.loc[i, 'A']
print(x)

output values: C: 0, 0, 4, 9, 15, 22, 27, 31, 0, 0, 0

Snow bunting
  • 1,120
  • 8
  • 28
  • Thanks Snow, the problem with this code is that the value is not reset to 0 when ['A'] < 2. Notice that the last three values of my hypothetical ['B'] column are 0. – Martingale Apr 08 '20 at 10:37
  • Do you mean a series `A` like `1, 1, 3, 4, 1, 5, 6, 1` should create `0, 0, 3, 7, 0, 5, 11, 0`? Sorry, did miss that in your question. – Snow bunting Apr 08 '20 at 10:44
  • Exactly! I try to sum the current value of A to my previous stored value in B only when A > 2. – Martingale Apr 08 '20 at 10:49
  • oh I see. I mean you could do it with simple manual loop, but I'll update the answer shortly. Maybe I have a cleverer idea. – Snow bunting Apr 08 '20 at 10:50