I have a column 'Gender' inside a synthetic dataframe with value_counts that look like this:
df['Gender'].value_counts()
male 42758
female 27170
other 27060
unknown 6849
0 724
Name: Gender, dtype: int64
I am preprocessing this dataset for linear regression. Does it make sense to club '0' and 'unknown' together and replace their occurrences with 'male', since 'male' is the most frequently occurring value?