How to replace Null values in dataframe with condition based on other columns?

Question

I have a dataframe where Income column have missing values, and I want to replace these missing values with the average of the income based on their education type.

I have already calculated the average income according to education level, but I have no idea how I am gonna replace null values with these averages!!

Here is the average income according to edu:

income_edu_based = df.groupby("Education")["Income"].mean().apply(lambda x: round(x,2))
Education
2n Cycle      47633.19
Basic         20306.26
Graduation    52720.37
Master        52917.53
PhD           56145.31
Name: Income, dtype: float64

I want to replace null values with above values but according to its education level!!!

Here's the vocabulary word you wanted to google for: "imputing". When you settle on a suitable technical solution, [tell us](https://stackoverflow.com/help/self-answer) about it. — J_H, Jan 07 '23 at 06:50

score 1 · Answer 1 · answered Jan 07 '23 at 07:20

Convert the Income mean df to a dict and using fillna map the dict key to the Education column:

mean_salary_mapping = (df
                       .groupby("Education")
                       .agg(Income=("Income", "mean"))
                       .apply(lambda x: round(x, 2))
                       .Income.to_dict()
                       )

df["Income"] = df["Income"].fillna(df["Education"].map(mean_salary_mapping))

score 0 · Answer 2 · answered Jan 07 '23 at 07:24

I would proceed with this:

def fill_income(row):
    if pd.isnull(row["Income"]):
        return income_dict[row["Education"]]
    return row["Income"]

or , faster way with .map :

df["Income"] = df.apply(fill_income, axis=1)
income_dict = income_edu_based.to_dict()
df["Income"] = df["Income"].map(income_dict)

How to replace Null values in dataframe with condition based on other columns?

2 Answers2