21

I'm trying to run a Random Forest on a pandas dataframe. I know there are no nulls or infinities in the dataframe but continually get a ValueError when I fit the model. Presumably this is because I have flaot64 columns rather than float32; I also have a lot of columns of type bool and int. Is there a way to change all the float columns to float32?

I've tried rewriting the CSV and am relatively certain the problem isn't with that. I've never had problems running random forests on float64s before so I'm not sure what's going wrong this time.

labels = electric['electric_ratio']
electric = electric[[x for x in electric.columns if x != 'electric_ratio']]
electric_list = electric.columns
first_train, first_test, train_labels, test_labels = train_test_split(electric, labels)
rf = RandomForestRegressor(n_estimators = 1000, random_state=88)
rf_1 = rf.fit(first_train, train_labels)

I expect this to fit the model, but instead consistently get

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
MK.
  • 211
  • 1
  • 2
  • 3

3 Answers3

35

You can use df.astype() with a dictionary for the columns you want to change with the corresponding dtype.

df = df.astype({'col1': 'object', 'col2': 'int'})
Zakariya
  • 461
  • 4
  • 4
13

To change the dtypes of all float64 columns to float32 columns try the following:

for column in df.columns:
    if df[column].dtype == 'float64':
        df[column] = df[column].astype(np.float32)
Vink
  • 571
  • 4
  • 15
0

You can use .astype() method for any pandas object to convert data types.

Example:

x = pd.DataFrame({'col1':[True, False, True], 'col2':[1, 2, 3], 'col3': [float('nan'), 0, None] })
x = x.astype('float32')
print(x)

Out[2]: 
   col1  col2  col3
0   1.0   1.0   NaN
1   0.0   2.0   0.0
2   1.0   3.0   NaN

You then need to handle any NaN values using .fillna() documentation for this is here

x = x.fillna(0)
Out[3]: 
   col1  col2  col3
0   1.0   1.0   0.0
1   0.0   2.0   0.0
2   1.0   3.0   0.0
nickyfot
  • 1,932
  • 17
  • 25