I would like to fill the missing values (18543) present in the target column/Dependent variable, Complaint-Status, in my data having class imbalance. There are five classes in the target column (multi-class classification problem).
What is the best way to fill these values without increasing the class imbalance?
Dataset
Replacing these missing values with mode of column i.e. 'Closed with explanation', will add to class imbalance only.
uniq, kounts = np.unique(df_ohe['Complaint-Status'], return_counts=True)
print(np.asarray((uniq, kounts)).T)
[['' 18543]
['Closed' 809]
['Closed with explanation' 34300]
['Closed with monetary relief' 2818]
['Closed with non-monetary relief' 5018]
['Untimely response' 321]]
The target class percentage
100*c_count.values/c_count.values.sum()
# array([55.49353654, 30.00048537, 8.11855879, 4.55920659, 1.30887088,
0.51934184])
Expected Output :
[['class_label', 18543]
['Closed' 809]
['Closed with explanation' 34300]
['Closed with monetary relief' 2818]
['Closed with non-monetary relief' 5018]
['Untimely response' 321]]