I have a sample dataframe which generically looks like this:
df = pd.Dataframe({'Class': [1, 2, 3, 2, 1, 2, 3, 2],
'Sex': [1, 0, 0, 0, 1, 1, 0, 1],
'Age': [15, 24, 13, 28, 29, NaN, 34, 27]})
Which displays as:
Age Class Sex
0 15.0 1 1
1 24.0 2 0
2 13.0 2 0
3 28.0 2 0
4 29.0 1 1
5 NaN 2 1
6 34.0 1 0
7 27.0 2 1
What I'd like to do is fill in each of the NaN values in the 'Age' series with the median value for all entries that have their 'Class' and 'Sex' grouping.
So for example, when I access these values like so:
df.groupby(['Class', 'Sex'])['Age'].median()
and get:
Class Sex
1 0 34.0
1 22.0
2 0 24.0
1 27.0
I'd like to write a function that automatically fills the extant NaN value with 27 since that is the median of the entries that have a Class value of 2 and a Sex value of 1.
Right now I have:
df['Age'] = df.groupby(['Class', 'Sex'])['Age'].apply(lambda x: x.median() if pd.isnull(x) else x)
and am getting the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
However, a very similar syntax was used in the answers for this question and this one, so I'm not quite sure why mine doesn't work, particularly, the latter also uses the isnull method in its lambda function so it's not clear to me why mine doesn't work but that one does.
I've also tried using the fillna method like so:
df['Age'] = df['Age'].fillna(df.groupby(['Class', 'Sex'])['Age'].median())
But got the following error message:
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'
I'm open to other methods which get the same value, but prefer something that relies entirely on Pandas methods without having to use a separate for-loop and passing it into the 'Apply' method to make it as concise as possible.
Thank you.