-1

I'm trying to make multiple groupby with pandas, but I encounter errors. I want to use both Date and category column.

What I want to do; to find the average of the returns of different funds in the same category on the same date. For example, On 2019-03 x category mean return is 0.025.

My sample python code:

df['benchmark_mean']= df.groupby(["category","date"])["fund_ret"].mean()

My Sample DataFrame:

date fund_ticker category fund_ret nav
2019-02 AAA x 0.05 1000
2019-03 AAA x 0.03 1030
2019-02 BBB y -0.01 ...
2019-03 BBB y 0.07 ...
2019-03 CCC x 0.02 ...
Clankk
  • 13
  • 4
  • 2
    What is wrong with your sample code? – Amit Vikram Singh Apr 20 '21 at 13:53
  • ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long' "incompatible index of inserted column with frame index" TypeError: incompatible index of inserted column with frame index – Clankk Apr 20 '21 at 20:18

1 Answers1

0

You probably need to create a new column called benchmark_mean that has the mean per group?, then you could do as below:

df.groupby(["category","date"],as_index=False).agg(
       benchmark_mean = pd.NamedAgg(column = 'fund_ret', aggfunc = 'mean'))

output:

    category       date benchmark_mean
0          x    2019-02          0.050
1          x    2019-03          0.025
2          y    2019-02         -0.010
3          y    2019-03          0.070

If you want a new column in the original dataframe with no change in rows, you can use groupby with transform as below:

df['benchmark_mean'] = df.groupby(['category', 'date'])['fund_ret'].transform('mean')

output of df after :

       date fund_ticker    nav   category   fund_ret    benchmark_mean
0   2019-02         AAA 1000.0         x        0.05             0.050
1   2019-03         AAA 1030.0         x        0.03             0.025
2   2019-02         BBB    NaN         y       -0.01            -0.010
3   2019-03         BBB    NaN         y        0.07             0.070
4   2019-03         CCC    NaN         x        0.02             0.025

Why we cannot use "apply" instead of "transform"?

You can use apply instead of transform definitely and you can get the same result as below:

df_bm = df.groupby(['date', 'category']).apply(lambda x : pd.Series(x['fund_ret'].mean(), ['benchmark_mean'])).reset_index()

df = df.merge(df_bm, on = ['date', 'category'], how = 'left')

You may want to look at this beautiful answer to look at differences between apply and transform.

How to get weighted mean?

I m going to use apply because transform works only on one Series at a time so I cannot use [nav, fund_ret] after groupby. The code will be :

def weighted_average(x):
    x['nav'] = x['nav'].replace(np.nan, 0)
    if x['nav'].nunique() == 1 and x['nav'].iloc[0] == 0:
        return 0
    return np.average(x['fund_ret'], weights=x['nav'])

df1 = df.groupby(['date', 'category'],as_index=False).apply(lambda x: pd.Series(weighted_average(x), ['weighted_mean']))['weighted_mean'].reset_index()
df.merge(df1, on = ['date', 'category'], how = 'left')

Output (after both the applys):

       date fund_ticker    nav category fund_ret    benchmark_mean  weighted_mean
0   2019-02        AAA  1000.0       x      0.05             0.050           0.05
1   2019-03        AAA  1030.0       x      0.03             0.025           0.03
2   2019-02        BBB     NaN       y     -0.01            -0.010            NaN
3   2019-03        BBB     NaN       y      0.07             0.070            NaN
4   2019-03        CCC     NaN       x      0.02             0.025           0.03
SomeDude
  • 13,876
  • 5
  • 21
  • 44
  • yes the calculation is correct but this code creates a new dataframe. I want to create a column next to the old dataframe and the number of rows should not change. For example, I use the following code in R language to get the result I want: ```r df<-df%>% group_by(date,category) %>% mutate(benchmark_ret = mean(fund_ret)) df<-df%>% group_by(date,category) %>% mutate(w_benchmark_ret = weighted.mean(fund_ret,nav)) ``` – Clankk Apr 20 '21 at 20:05
  • Your last code is running without error and running perfectly. It was exactly what I was looking for, thank you. I will have two last questions, the first one is how can I make weighted average? The conditions for a weighted average group by are still the same, using NAV and fund_ret. My second question is why did we use transform instead of apply? – Clankk Apr 22 '21 at 18:59
  • thank you for everything but merge line isnt working for me KeyError: 'date' – Clankk Apr 23 '21 at 19:32
  • You need to have 'date' column in your dataframe – SomeDude Apr 23 '21 at 22:12