Add column with average value grouped by column

Question

I want to replace column value of dataframe with mean(without zeros) value of column grouped by another column.

Dataframe df is like:

ID | TYPE | rate
-------------
1  |    A | 0 <- Replace this
2  |    B | 2
3  |    C | 1
4  |    A | 2
5  |    C | 1
6  |    C | 0 <- Replace this
7  |    C | 8
8  |    C | 2
9  |    D | 0 <- Replace this

I have to replace values in rating where rating = 0:

df['rate'][df['rate']==0] = ?

with average value for that TYPE.

Average(without zeros) value for every type is:

A = 2/1 = 2

B = 2/1 = 2

C = (1 + 1 + 8 + 2)/4 = 3

D = 0 (default value when there isn't information for type)

Expected result:

ID | TYPE | rate
-------------
1  |    A | 2 <- Changed
2  |    B | 2
3  |    C | 1
4  |    A | 2
5  |    C | 1
6  |    C | 3 <- Changed
7  |    C | 8
8  |    C | 2
9  |    D | 0 <- Changed

Be careful with this `df['rate'][df['rate']==0] = ...`, always use this instead `df.loc[df['rate']==0, 'rate'] = ...`, see for instance [here](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas). — IanS, Jun 04 '19 at 12:26

yatu · Accepted Answer · 2019-06-04T12:35:59.467

3

You could mask the rate column in the dataframe, GroupBy the TYPE and transform with the mean, which will exlude NaNs. The use fillna to replace the values in the masked dataframe:

ma = df.rate.mask(df.rate.eq(0))
df['rate'] = ma.fillna(ma.groupby(df.TYPE).transform('mean').fillna(0))

   ID  TYPE  rate
0   1    A   2.0
1   2    B   2.0
2   3    C   1.0
3   4    A   2.0
4   5    C   1.0
5   6    C   3.0
6   7    C   8.0
7   8    C   2.0
8   9    D   0.0

edited Jun 04 '19 at 12:35

answered Jun 04 '19 at 12:28

yatu

86,083
12
84
139

That... seems wrong, you just end up with essentially the mean everywhere. – IanS Jun 04 '19 at 12:32
OP wants to replace values where the `rate` is 0 with the mean of the corresponding group exluding 0s. That's what is being done here @IanS – yatu Jun 04 '19 at 12:33

Add column with average value grouped by column

1 Answers1