I am using https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset I have converted the Object fields to Category.
import pandas as pd
import miceforest as mf
df = pd.read_csv('HR_Employee_Attrition.csv')
df_Mice = df
df_Mice[['Attrition', 'BusinessTravel', 'Department', 'EducationField', 'Gender', 'JobRole', 'MaritalStatus', 'Over18', 'OverTime']] = df_Mice[['Attrition', 'BusinessTravel', 'Department', 'EducationField', 'Gender', 'JobRole', 'MaritalStatus', 'Over18', 'OverTime']].astype("category")
kds = mf.ImputationKernel(
df_Mice,
datasets=5,
save_all_iterations=True,
random_state=11
)
kds.mice(5)
kds.complete_data(4)
Imputation = pd.concat([kds.complete_data(i) for i in range(5)]).groupby(level=0).mean()
Imputation.head(10)
I read Mice function not getting data columns to impute, but couldn't get it to use factors for those columns. (I think it shouldn't matter in Python, but most of the MICE problem answers are related to the R version.)
So now I have the problem that it's only returning the numerical columns in the output, not the text ones, and it hasn't dealt with the missing value in the Gender column at all.
How do I get it to return all 35 columns in the output?