0

In a dataframe such as

df = pd.DataFrame({'A' : ['a','a','b','b','a'],
           'B' : [1,22,8,3,3]})

I need to partition data by A and find max of B in each partition and save the result in a new column C

df = pd.DataFrame({'A' : ['a','a','b','b','a'],
                   'B' : [1,22,8,3,3],
                   'C' : [22, 22 , 8, 8 , 22]})

I have tried

df['C'] = df.groupby(['A'])['B'].max()

but this simply only adds a Nan column.

moshtaba
  • 381
  • 1
  • 8
  • use `transform`: ``df.assign(C = df.groupby('A').B.transform('max'))`` It is sort of equivalent to SQL's partition by – sammywemmy Jun 04 '22 at 01:58

1 Answers1

1

Use groupby + transform:`

df['C'] = df.groupby('A')['B'].transform('max')

Output:

>>> df
   A   B   C
0  a   1  22
1  a  22  22
2  b   8   8
3  b   3   8
4  a   3  22