3

I want to replace column value of dataframe with mean(without zeros) value of column grouped by another column.

Dataframe df is like:

ID | TYPE | rate
-------------
1  |    A | 0 <- Replace this
2  |    B | 2
3  |    C | 1
4  |    A | 2
5  |    C | 1
6  |    C | 0 <- Replace this
7  |    C | 8
8  |    C | 2
9  |    D | 0 <- Replace this

I have to replace values in rating where rating = 0:

df['rate'][df['rate']==0] = ?

with average value for that TYPE.

Average(without zeros) value for every type is:

A = 2/1 = 2

B = 2/1 = 2

C = (1 + 1 + 8 + 2)/4 = 3

D = 0 (default value when there isn't information for type)

Expected result:

ID | TYPE | rate
-------------
1  |    A | 2 <- Changed
2  |    B | 2
3  |    C | 1
4  |    A | 2
5  |    C | 1
6  |    C | 3 <- Changed
7  |    C | 8
8  |    C | 2
9  |    D | 0 <- Changed
Community
  • 1
  • 1
Morticia A. Addams
  • 363
  • 1
  • 7
  • 19
  • 1
    Be careful with this `df['rate'][df['rate']==0] = ...`, always use this instead `df.loc[df['rate']==0, 'rate'] = ...`, see for instance [here](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas). – IanS Jun 04 '19 at 12:26

1 Answers1

3

You could mask the rate column in the dataframe, GroupBy the TYPE and transform with the mean, which will exlude NaNs. The use fillna to replace the values in the masked dataframe:

ma = df.rate.mask(df.rate.eq(0))
df['rate'] = ma.fillna(ma.groupby(df.TYPE).transform('mean').fillna(0))

   ID  TYPE  rate
0   1    A   2.0
1   2    B   2.0
2   3    C   1.0
3   4    A   2.0
4   5    C   1.0
5   6    C   3.0
6   7    C   8.0
7   8    C   2.0
8   9    D   0.0
yatu
  • 86,083
  • 12
  • 84
  • 139
  • That... seems wrong, you just end up with essentially the mean everywhere. – IanS Jun 04 '19 at 12:32
  • OP wants to replace values where the `rate` is 0 with the mean of the corresponding group exluding 0s. That's what is being done here @IanS – yatu Jun 04 '19 at 12:33