Below code is what I have. Seems to work for ?, '
and ''
but not for np.NaN
. Any suggestions?
Also, I am new to Pandas/Python and hence would like to know if there is a faster way to do this
I am thinking of treating features as suspect if more than X%(say 5%) of the rows have missing values. Any other data sanitization initial checks that you regularly use
for col in df.columns:
pcnt_missing = df[df[col].isin(['?','',' ',np.NaN])][col].count() * 100.0 / df[col].count()
if pcnt_missing > 1:
print(f"Col = {col}, Percent missing ={pcnt_missing:.2f}")