0

Let's suppose there is a missing value of Age where the sport is Swimming, then replace that missing value of age with the mean age of all the players who belong to Swimming. Similarly for all other sports. How can I do that?

enter image description here

  • `df['Age'] = df['Age'].fillna(df.groupby('Sport')['Age'].transform('mean'))` (as [here](/a/53339320/15497888)) `df.groupby('Sport')['Age'].transform('mean').astype(int)` if needing whole number instead of actual average age. – Henry Ecker Aug 05 '22 at 18:26

1 Answers1

0

This is how you can fill the age with the mean value of the column.

df['Age'].fillna(int(df['Age'].mean()), inplace=True)

You can also use sklearn to achieve that in the whole df:

import pandas as pd
import numpy as np
  
df= pd.read_csv("data.csv")
X = df.iloc[:,0].values
  
# To calculate mean use imputer class
from sklearn.impute import SimpleImputer

imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer = imputer.fit(X)
  
X = imputer.transform(X)
print(X)
  • df['Age'].mean() will give the average of the entire 'Age' column which is not required. I want to fill the blank values of with the respective mean value of each sport. like the way I mentioned in the example – Subhajit Nag Aug 05 '22 at 16:44