I have been working with the Adult Census Dataset available at: https://archive.ics.uci.edu/ml/datasets/census+income
and for what I read it present some missing values marked with "?". I am building a classifier so I want to get replace those values with the mode, but I have found some problems with that. My source code is the following and I am putting comments on the issues that I have encountered:
import pandas as pd from sklearn import preprocessing import numpy as np
def open(fileR):
head=["gt lt 50","age","workclass","fnlwgt","edu","edu-num","mar-sta","occ","rela","race","sex","cap-gain","cap-loss","country","hpw"]
f=pd.read_csv(fileR,sep=',')
f.columns=head
f.replace('?',np.nan) #I want to replace the ? values with nan
f = f.fillna(f.mode().iloc[:,1]) #replace the nan values with the mode
print (f.iloc[:,1])
but the values that I got are still with the ? sign, for example:
25 Private
26 ?
27 Private
28 Private
29 Local-gov
I want to change all the ? values from the categorical variables of my f dataframe by using the mode, is there some step that I missing?
PD.
I have also tried the following for checking just one column:
f.replace('?',np.nan,inplace=True)
f = f.fillna(f.mode().iloc[:,1])
print (f.iloc[:,1])
but still it prints the ? values.
Thanks