You should use .apply()
instead of .transform()
, as follows:
df.groupby('date').apply(lambda x: (x['A'] * x['B']).sum() / x['B'].sum())
By using .apply()
, the x
variables in the lambda
function will be recognised as DataFrame
objects representing each group, which can each be indexed for individual columns.
The .transform()
method only treats x
as a Series
object representing only one column at a time.
From this answer:
Two major differences between `apply` and `transform`
There are two major differences between the transform
and apply
groupby methods.
- Input:
apply
implicitly passes all the columns for each group as a DataFrame to the custom function.
- while
transform
passes each column for each group individually as a Series to the custom function.
- Output:
- The custom function passed to
apply
can return a scalar, or a Series or DataFrame (or numpy array or even list).
- The custom function passed to
transform
must return a sequence (a one dimensional Series, array or list) the same length as the group.
So, transform
works on just one Series at a time and apply
works on the entire DataFrame at once.
Example
Sample data:
import pandas as pd
data = {
'date': ['01/01/1999', '02/01/1999', '03/01/1999', '03/01/1999'],
'A': [2, 4, 7, 4],
'B': [5, 7, 9, 6]
}
df = pd.DataFrame(data)
print(df)
date A B
0 01/01/1999 2 5
1 02/01/1999 4 7
2 03/01/1999 7 9
3 03/01/1999 4 6
The calculation for '03/01/1999'
would be:
((7 * 9) + (4 * 6)) / (9 + 6) # = 5.8
Calculation for each date group using .apply()
:
df_ab_calc = df.groupby('date').apply(lambda x: (x['A'] * x['B']).sum() / x['B'].sum())
print(df_ab_calc)
date
01/01/1999 2.0
02/01/1999 4.0
03/01/1999 5.8