-1

I have a dataframe where Income column have missing values, and I want to replace these missing values with the average of the income based on their education type.

I have already calculated the average income according to education level, but I have no idea how I am gonna replace null values with these averages!!

Here is the average income according to edu:

income_edu_based = df.groupby("Education")["Income"].mean().apply(lambda x: round(x,2))
Education
2n Cycle      47633.19
Basic         20306.26
Graduation    52720.37
Master        52917.53
PhD           56145.31
Name: Income, dtype: float64

I want to replace null values with above values but according to its education level!!!

petezurich
  • 9,280
  • 9
  • 43
  • 57
  • Here's the vocabulary word you wanted to google for: "imputing". When you settle on a suitable technical solution, [tell us](https://stackoverflow.com/help/self-answer) about it. – J_H Jan 07 '23 at 06:50

2 Answers2

1

Convert the Income mean df to a dict and using fillna map the dict key to the Education column:

mean_salary_mapping = (df
                       .groupby("Education")
                       .agg(Income=("Income", "mean"))
                       .apply(lambda x: round(x, 2))
                       .Income.to_dict()
                       )

df["Income"] = df["Income"].fillna(df["Education"].map(mean_salary_mapping))
Jason Baker
  • 3,170
  • 2
  • 12
  • 15
0

I would proceed with this:

def fill_income(row):
    if pd.isnull(row["Income"]):
        return income_dict[row["Education"]]
    return row["Income"]

or , faster way with .map :

df["Income"] = df.apply(fill_income, axis=1)
income_dict = income_edu_based.to_dict()
df["Income"] = df["Income"].map(income_dict)
Lorenzo Bassetti
  • 795
  • 10
  • 15