To obtain results executed on groupby-data with the same level of detail as the original DataFrame (same observation count) I have used the transform function.
Example: Original dataframe
name, year, grade
Jack, 2010, 6
Jack, 2011, 7
Rosie, 2010, 7
Rosie, 2011, 8
After groupby transform
name, year, grade, average grade
Jack, 2010, 6, 6.5
Jack, 2011, 7, 6.5
Rosie, 2010, 7, 7.5
Rosie, 2011, 8, 7.5
However, with more advanced functions based on multiple columns things get more complicated. What puzzles me is that I seem to be unable to access multiple columns in a groupby-transform combination.
df = pd.DataFrame({'a':[1,2,3,4,5,6],
'b':[1,2,3,4,5,6],
'c':['q', 'q', 'q', 'q', 'w', 'w'],
'd':['z','z','z','o','o','o']})
def f(x):
y=sum(x['a'])+sum(x['b'])
return(y)
df['e'] = df.groupby(['c','d']).transform(f)
Gives me:
KeyError: ('a', 'occurred at index a')
Though I know that following does work:
df.groupby(['c','d']).apply(f)
What causes this behavior and how can I obtain something like this:
a b c d e
1 1 q z 12
2 2 q z 12
3 3 q z 12
4 4 q o 8
5 5 w o 22
6 6 w o 22