Fill the missing value in Age column values by (means of the age of the players belonging to that particular game)

Question

Let's suppose there is a missing value of Age where the sport is Swimming, then replace that missing value of age with the mean age of all the players who belong to Swimming. Similarly for all other sports. How can I do that?

enter image description here

`df['Age'] = df['Age'].fillna(df.groupby('Sport')['Age'].transform('mean'))` (as [here](/a/53339320/15497888)) `df.groupby('Sport')['Age'].transform('mean').astype(int)` if needing whole number instead of actual average age. — Henry Ecker, Aug 05 '22 at 18:26

score 0 · Answer 1 · answered Aug 04 '22 at 20:45

0

This is how you can fill the age with the mean value of the column.

df['Age'].fillna(int(df['Age'].mean()), inplace=True)

You can also use sklearn to achieve that in the whole df:

import pandas as pd
import numpy as np
  
df= pd.read_csv("data.csv")
X = df.iloc[:,0].values
  
# To calculate mean use imputer class
from sklearn.impute import SimpleImputer

imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer = imputer.fit(X)
  
X = imputer.transform(X)
print(X)

answered Aug 04 '22 at 20:45

Luis Alejandro Vargas Ramos

990
2
8
18

df['Age'].mean() will give the average of the entire 'Age' column which is not required. I want to fill the blank values of with the respective mean value of each sport. like the way I mentioned in the example – Subhajit Nag Aug 05 '22 at 16:44

Fill the missing value in Age column values by (means of the age of the players belonging to that particular game)

1 Answers1