Pandas dataframe, ValueError: could not convert string to float:

Question

I'm using the scikit-learn preprocessing and random forest ensemble techniques on a Pandas dataframe of 400,000 x 600 sized dataframe (800MB). I get this value error when I pass this dataframe through the algorithms, possibly due to extra spaces somewhere in the dataframe. How do I clean all the spaces from my dataframe that should only contain numerical values, and absolutely no strings?

score 0 · Answer 1 · answered Jan 31 '17 at 01:59

You can convert the data frame to a different type.

For instance the data frame:

df = pd.DataFrame({'x': [5,7,9], 'y':[3,1,'2 ']})

Has an extra space in the last value. This will store the y column as an object instead of a integer array. To convert it you can use either:

df = df.astype(int)     # this
df = df.astype(float)   # or this

This will convert the entire data frame to the given type. The other way to handle it is when the file is read (assuming you are reading a csv or other format).

Pandas dataframe, ValueError: could not convert string to float:

1 Answers1