I am operating the UCI data sets, some of them contains "?" in lines. For example:
56.0,1.0,2.0,130.0,221.0,0.0,2.0,163.0,0.0,0.0,1.0,0.0,7.0,0
58.0,1.0,2.0,125.0,220.0,0.0,0.0,144.0,0.0,0.4,2.0,?,7.0,0
57.0,0.0,2.0,130.0,236.0,0.0,2.0,174.0,0.0,0.0,2.0,1.0,3.0,1
38.0,1.0,3.0,138.0,175.0,0.0,0.0,173.0,0.0,0.0,1.0,?,3.0,0
I firstly use numpy.loadtxt()
to load file, and try to delete the lines with "?" using line.contains('?')
, but got error with the type.
Then I use pandas.read_csv
, however, I still have no easy way to delete all lines contains a specific letter "?".
Is there any easy way to clean the data? I need a float type data file without any "?" in it. Thanks~