3

I have a dataframe df:

import pandas as pd    
df = pd.DataFrame({'A': [1, 1, 1,2,2,2,2], 
               'B': [10, 0, 0,5,0,0,0], 
               'C': [1,1,2,2,3,3,3],
               'D': [2,3,4,5,2,3,4]})

which looks like:

   A   B  C  D
0  1  10  1  2
1  1   0  1  3
2  1   0  2  4
3  2   5  2  5
4  2   0  3  2
5  2   0  3  3
6  2   0  3  4

I want to compute the value in column B only for those locations where it is 0 for all groups (1,2 as per example data) denoted in column A.

value of column B = value of column B in previous record + value of col C in same record + value of col D in same record.

My expected output is:

   A   B  C  D
0  1  10  1  2
1  1  14  1  3
2  1  20  2  4
3  2   5  2  5
4  2  10  3  2
5  2  16  3  3
6  2  23  3  4

How can I do it in pandas ?

ASGM
  • 11,051
  • 1
  • 32
  • 53
Archit
  • 542
  • 1
  • 4
  • 15

2 Answers2

4

This should do it:

def f(g):
    g.B = (g.B.shift() + g.C + g.D).cumsum()
    return g

df.B.replace(0, df.groupby('A').apply(f).B)

The result is:

   A   B  C  D
0  1  10  1  2
1  1  14  1  3
2  1  20  2  4
3  2   5  2  5
4  2  10  3  2
5  2  16  3  3
6  2  23  3  4
ASGM
  • 11,051
  • 1
  • 32
  • 53
  • 1
    More info :-) https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code – BENY May 10 '19 at 18:54
  • Hey @WeNYoBen, thanks for the link - that's really informative! My reasoning in this answer was that, in the words of [jpp's answer](https://stackoverflow.com/a/54433552/2074981): "`GroupBy.apply` is usually fine here, provided the methods you use in your custom function are themselves vectorised". But it's a really useful discussion for anyone using `apply`. – ASGM May 10 '19 at 20:07
  • Yw:-) happy coding – BENY May 10 '19 at 20:08
1

IIUC two cumsum

s=df[['C','D']].mask(df.B!=0)

df.groupby('A').B.cumsum()+s.groupby(df.A).cumsum().sum(1)
Out[633]: 
0    10.0
1    14.0
2    20.0
3     5.0
4    10.0
5    16.0
6    23.0
dtype: float64
BENY
  • 317,841
  • 20
  • 164
  • 234