1

I'm using the scikit-learn preprocessing and random forest ensemble techniques on a Pandas dataframe of 400,000 x 600 sized dataframe (800MB). I get this value error when I pass this dataframe through the algorithms, possibly due to extra spaces somewhere in the dataframe. How do I clean all the spaces from my dataframe that should only contain numerical values, and absolutely no strings?

Pearl Philip
  • 883
  • 2
  • 11
  • 16

1 Answers1

0

You can convert the data frame to a different type.

For instance the data frame:

df = pd.DataFrame({'x': [5,7,9], 'y':[3,1,'2 ']})

Has an extra space in the last value. This will store the y column as an object instead of a integer array. To convert it you can use either:

df = df.astype(int)     # this
df = df.astype(float)   # or this

This will convert the entire data frame to the given type. The other way to handle it is when the file is read (assuming you are reading a csv or other format).

James
  • 32,991
  • 4
  • 47
  • 70