1

I want to impute some blank values with the median for my dataframe which looks like this :

ID Salary Position
1  10     VP
2         VP
3  5      VP
4  15     AVP
5  20     AVP
6         AVP

Now the blank salaries have to be replaced by the position level Median. For example : the blank salary for ID = 2 and position as VP should be imputed by the median of position VP which is 5 and the same blank for AVP should be imputed in a similar fashion.

I have used the following code but this is taking the full median and not the specific one at Position level :

impute_median=df['Salary'].median()
df['Salary']=df['Salary'].fillna(impute_median)

The output should look like this :

   ID Salary Position
   1      10     VP
   2      5      VP
   3      5      VP
   4      15     AVP
   5      20     AVP
   6      15     AVP
Django0602
  • 797
  • 7
  • 26

2 Answers2

2

To fill with median you should use:

df['Salary'] = df['Salary'].fillna(df.groupby('Position').Salary.transform('median'))
print(df)
   ID  Salary Position
0   1    10.0       VP
1   2     7.5       VP
2   3     5.0       VP
3   4    15.0      AVP
4   5    20.0      AVP
5   6    17.5      AVP

if you want to fill in with the closest to medium value (less)

df['Salary'] = df['Salary'].fillna(df.Salary.sub(df.groupby('Position')
                                    .Salary
                                    .transform('median'))
                           .where(lambda x: x.le(0))
                           .groupby(df['Position'])
                           .transform('idxmax')
                           .map(df['Salary']))
print(df)
0   1    10.0       VP
1   2     5.0       VP
2   3     5.0       VP
3   4    15.0      AVP
4   5    20.0      AVP
5   6    15.0      AVP 
ansev
  • 30,322
  • 5
  • 17
  • 31
1

Try this:

df['Salary']=df.groupby(['Position'])['Salary'].apply(lambda x:x.fillna(x.median()))

Essentially we apply a groupby on the position with respect to salary and then fillna with the median of each group.

Edeki Okoh
  • 1,786
  • 15
  • 27
  • https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code – ansev Feb 05 '20 at 17:26
  • In the event that your solution is something similar to what I have done and where more than one `groupby` sentences is required it could be use `apply` and it could only. In this solution that you proposes, it is much faster to use: `df['Salary'] = df['Salary'].fillna(df.groupby('Position').Salary.transform('median'))` – ansev Feb 05 '20 at 17:31