Adding a summarised column back into a dataframe in python

Question

as part of some data cleansing, i want to add the mean of a variable back into a dataframe to use if the variable is missing for a particular observation. so i've calculated my averages as follows

avg=all_data2.groupby("portfolio")"[sales"].mean().reset_index(name="sales_mean")

now I wanted to add that back into my original dataframe using a left join, but it doesnt appear to be working. what format is my avg now? I thought it would be a dataframe but is it something else?

constantstranger · Accepted Answer · 2022-05-09T18:11:14.460

UPDATED

This is probably the most succinct way to do it:

all_data2.sales = all_data2.sales.fillna(all_data2.groupby('portfolio').sales.transform('mean'))

This is another way to do it:

all_data2['sales'] = all_data2[['portfolio', 'sales']].groupby('portfolio').transform(lambda x: x.fillna(x.mean()))

Output:

   portfolio  sales
0          1   10.0
1          1   20.0
2          2   30.0
3          2   40.0
4          3   50.0
5          3   60.0
6          3    NaN
   portfolio  sales
0          1   10.0
1          1   20.0
2          2   30.0
3          2   40.0
4          3   50.0
5          3   60.0
6          3   55.0

To answer your the part of your question that reads "what format is my avg now? I thought it would be a dataframe but is it something else?", avg is indeed a dataframe but using it may not be the most direct way to update missing data in the original dataframe. The dataframe avg looks like this for the sample input data above:

   portfolio  sales_mean
0          1        15.0
1          2        35.0
2          3        55.0

A related SO question that you may find helpful is here.

@Denis Mclaughlin Do you need additional help with your question? — constantstranger, May 08 '22 at 04:27

score 0 · Answer 2 · edited May 08 '22 at 03:18

0

If you want to add a new column, you can use this code:

df['sales_mean']=df[['sales_1','sales_2']].mean(axis=1)

edited May 08 '22 at 03:18

answered May 06 '22 at 14:49

Mr.F.K

21
1
3

does that work if I want the mean split by a different categorical variable? – Denis Mclaughlin May 06 '22 at 15:10

Adding a summarised column back into a dataframe in python

2 Answers2