Finding difference between two columns of a dataframe along with groupby

Question

I saw a primitive version of this question here

but i my dataframe has diffrent names and i want to calculate separately for them

   A   B   C
0  a   3   5
1  a   6   9
2  b   3   8
3  b  11  19

i want to groupby A and then find diffence between alternate B and C.something like this

   A   B   C   dA
0  a   3   5   6
1  a   6   9  NaN
2  b   3   8  16
3  b  11  19  NaN

i tried doing

df['dA']=df.groupby('A')(['C']-['B'])
df['dA']=df.groupby('A')['C']-df.groupby('A')['B']

none of them helped what mistake am i making?

how are the values of dA calculated in your example? Where are 6 and 16 coming from? — It_is_Chris, Aug 13 '20 at 20:19
9 from C - 3 from B for a ,similarly 19 from C - 3 from B for b — Vivek Anand, Aug 13 '20 at 20:46
If i just do df['dA']=df['C']-df['B'],it works but dA also contains 2(between 6 and 16) Which is calculated using one data from a and one from b. Which i dont want so i want to use it with groupby — Vivek Anand, Aug 13 '20 at 20:50
are you looking for `df.groupby('A').apply(lambda x: x['C'].shift(-1) - x['B'])`? — Ben.T, Aug 13 '20 at 21:09

score 0 · Answer 1 · answered Aug 13 '20 at 22:04

IIUC, here is one way to perform the calculation:

# create the data frame
from io import StringIO
import pandas as pd

data = '''idx    A   B   C
0  a   3   5
1  a   6   9
2  b   3   8
3  b  11  19
'''

df = pd.read_csv(StringIO(data), sep='\s+', engine='python').set_index('idx')

Now, compute dA. I look last value of C less first value of B, as grouped by A. (Is this right? Or is it max(C) less min(B)?). If you're guaranteed to have the A values in pairs, then @BenT's shift() would be more concise.

dA = (
    (df.groupby('A')['C'].transform('last') - 
     df.groupby('A')['B'].transform('first'))
       .drop_duplicates()
       .rename('dA'))

print(pd.concat([df, dA], axis=1))

     A   B   C    dA
idx                 
0    a   3   5   6.0
1    a   6   9   NaN
2    b   3   8  16.0
3    b  11  19   NaN

I used groupby().transform() to preserve index values, to support the concat operation.

Finding difference between two columns of a dataframe along with groupby

1 Answers1