I'm using the scikit-learn preprocessing and random forest ensemble techniques on a Pandas dataframe of 400,000 x 600 sized dataframe (800MB). I get this value error when I pass this dataframe through the algorithms, possibly due to extra spaces somewhere in the dataframe. How do I clean all the spaces from my dataframe that should only contain numerical values, and absolutely no strings?
Asked
Active
Viewed 7,433 times
1 Answers
0
You can convert the data frame to a different type.
For instance the data frame:
df = pd.DataFrame({'x': [5,7,9], 'y':[3,1,'2 ']})
Has an extra space in the last value. This will store the y
column as an object instead of a integer array. To convert it you can use either:
df = df.astype(int) # this
df = df.astype(float) # or this
This will convert the entire data frame to the given type. The other way to handle it is when the file is read (assuming you are reading a csv or other format).

James
- 32,991
- 4
- 47
- 70