dealing with missing categorical values with Pandas?

Question

I have been working with the Adult Census Dataset available at: https://archive.ics.uci.edu/ml/datasets/census+income

and for what I read it present some missing values marked with "?". I am building a classifier so I want to get replace those values with the mode, but I have found some problems with that. My source code is the following and I am putting comments on the issues that I have encountered:

import pandas as pd from sklearn import preprocessing import numpy as np

def open(fileR):
    head=["gt lt 50","age","workclass","fnlwgt","edu","edu-num","mar-sta","occ","rela","race","sex","cap-gain","cap-loss","country","hpw"]
    f=pd.read_csv(fileR,sep=',')
    f.columns=head
    f.replace('?',np.nan)   #I want to replace the ? values with nan 
    f = f.fillna(f.mode().iloc[:,1])        #replace the nan values with the mode
    print (f.iloc[:,1])

but the values that I got are still with the ? sign, for example:

25                 Private
26                       ?
27                 Private
28                 Private
29               Local-gov

I want to change all the ? values from the categorical variables of my f dataframe by using the mode, is there some step that I missing?

PD.

I have also tried the following for checking just one column:

    f.replace('?',np.nan,inplace=True)
    f = f.fillna(f.mode().iloc[:,1])
    print (f.iloc[:,1])

but still it prints the ? values.

Thanks

@anky_91 I have tried that and with inplace also and still when I print the values of f the ? still appears — Little, Aug 06 '19 at 14:31
@Little This is not in fact a duplicate. Thanks for linking to the data. I had a look and the files do not use the standard comma delimiter; rather, they use a comma with a space after. So when you load the data, use `sep=', '` instead. The reason you couldn't replace the `'?'` strings is because they didn't exist in the data. They were actually `' ?'`. In fact, _all_ of the values had a space prefixing them. This will fix that issue. — brentertainer, Aug 07 '19 at 00:39

dealing with missing categorical values with Pandas?

0 Answers0