1

I saw a primitive version of this question here

but i my dataframe has diffrent names and i want to calculate separately for them

   A   B   C
0  a   3   5
1  a   6   9
2  b   3   8
3  b  11  19

i want to groupby A and then find diffence between alternate B and C.something like this

   A   B   C   dA
0  a   3   5   6
1  a   6   9  NaN
2  b   3   8  16
3  b  11  19  NaN

i tried doing

df['dA']=df.groupby('A')(['C']-['B'])
df['dA']=df.groupby('A')['C']-df.groupby('A')['B']

none of them helped what mistake am i making?

Vivek Anand
  • 381
  • 1
  • 10

1 Answers1

0

IIUC, here is one way to perform the calculation:

# create the data frame
from io import StringIO
import pandas as pd

data = '''idx    A   B   C
0  a   3   5
1  a   6   9
2  b   3   8
3  b  11  19
'''

df = pd.read_csv(StringIO(data), sep='\s+', engine='python').set_index('idx')

Now, compute dA. I look last value of C less first value of B, as grouped by A. (Is this right? Or is it max(C) less min(B)?). If you're guaranteed to have the A values in pairs, then @BenT's shift() would be more concise.

dA = (
    (df.groupby('A')['C'].transform('last') - 
     df.groupby('A')['B'].transform('first'))
       .drop_duplicates()
       .rename('dA'))

print(pd.concat([df, dA], axis=1))

     A   B   C    dA
idx                 
0    a   3   5   6.0
1    a   6   9   NaN
2    b   3   8  16.0
3    b  11  19   NaN

I used groupby().transform() to preserve index values, to support the concat operation.

jsmart
  • 2,921
  • 1
  • 6
  • 13