Let's suppose there is a missing value of Age where the sport is Swimming, then replace that missing value of age with the mean age of all the players who belong to Swimming. Similarly for all other sports. How can I do that?
Asked
Active
Viewed 1,077 times
0
-
`df['Age'] = df['Age'].fillna(df.groupby('Sport')['Age'].transform('mean'))` (as [here](/a/53339320/15497888)) `df.groupby('Sport')['Age'].transform('mean').astype(int)` if needing whole number instead of actual average age. – Henry Ecker Aug 05 '22 at 18:26
1 Answers
0
This is how you can fill the age with the mean value of the column.
df['Age'].fillna(int(df['Age'].mean()), inplace=True)
You can also use sklearn to achieve that in the whole df:
import pandas as pd
import numpy as np
df= pd.read_csv("data.csv")
X = df.iloc[:,0].values
# To calculate mean use imputer class
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer = imputer.fit(X)
X = imputer.transform(X)
print(X)

Luis Alejandro Vargas Ramos
- 990
- 2
- 8
- 18
-
df['Age'].mean() will give the average of the entire 'Age' column which is not required. I want to fill the blank values of with the respective mean value of each sport. like the way I mentioned in the example – Subhajit Nag Aug 05 '22 at 16:44