0

I have a df with students from three different classes. I am trying to fill in the missing ages based on the mean age of the other students in the same class. I tried two different ways. One is working and the other one is not . I am not able to figure out why that is the case as I feel both ways are doing the exact same thing. Could you kindly explain me why the solution B is not working while A works?

Solution A: (Working)

df.loc[(df['Age'].isna()) & (df['Class'] == 1),'Age'] = mean_age

Solution B: (not working)

df.loc[df['Class'] == 1,'Age'].fillna(mean_age, inplace=True)
user3234112
  • 103
  • 8

2 Answers2

1

IIUC:

df['Age'] = df['Age'].fillna(df.groupby('Class')['Age'].transform('mean'))

The solution B can't work because you slice your dataframe so you create a "copy" and fill nan values inplace. The copy is filled but not the original dataframe.

Corralien
  • 109,409
  • 8
  • 28
  • 52
1

When you call loc, you're slicing your DataFrame to return a copy of it, and since inplace=True works on the object that it was applied on, the copy is indeed changed but the original DataFrame remains unchanged. If you change

df.loc[df['Class'] == 1,'Age'].fillna(mean_age, inplace=True)

to

df.loc[df['Class'] == 1,'Age'] = df.loc[df['Class'] == 1,'Age'].fillna(mean_age)

or (as in @Corralien's answer)

df['Age'].fillna(df.groupby('Class')['Age'].transform('mean'), inplace=True)

then it will work as expected because in these cases the original DataFrame column is changed.