How to change datatype of multiple columns in pandas

Question

I'm trying to run a Random Forest on a pandas dataframe. I know there are no nulls or infinities in the dataframe but continually get a ValueError when I fit the model. Presumably this is because I have flaot64 columns rather than float32; I also have a lot of columns of type bool and int. Is there a way to change all the float columns to float32?

I've tried rewriting the CSV and am relatively certain the problem isn't with that. I've never had problems running random forests on float64s before so I'm not sure what's going wrong this time.

labels = electric['electric_ratio']
electric = electric[[x for x in electric.columns if x != 'electric_ratio']]
electric_list = electric.columns
first_train, first_test, train_labels, test_labels = train_test_split(electric, labels)
rf = RandomForestRegressor(n_estimators = 1000, random_state=88)
rf_1 = rf.fit(first_train, train_labels)

I expect this to fit the model, but instead consistently get

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

Possible duplicate of [Change data type of columns in Pandas](https://stackoverflow.com/questions/15891038/change-data-type-of-columns-in-pandas) — MyNameIsCaleb, Apr 24 '19 at 16:02

score 35 · Answer 1 · answered Nov 23 '21 at 10:30

35

You can use df.astype() with a dictionary for the columns you want to change with the corresponding dtype.

df = df.astype({'col1': 'object', 'col2': 'int'})

answered Nov 23 '21 at 10:30

Zakariya

461
4
4

score 13 · Answer 2 · answered Apr 24 '19 at 15:58

13

To change the dtypes of all float64 columns to float32 columns try the following:

for column in df.columns:
    if df[column].dtype == 'float64':
        df[column] = df[column].astype(np.float32)

answered Apr 24 '19 at 15:58

Vink

571
4
15

score 0 · Answer 3 · answered Apr 24 '19 at 15:47

You can use .astype() method for any pandas object to convert data types.

Example:

x = pd.DataFrame({'col1':[True, False, True], 'col2':[1, 2, 3], 'col3': [float('nan'), 0, None] })
x = x.astype('float32')
print(x)

Out[2]: 
   col1  col2  col3
0   1.0   1.0   NaN
1   0.0   2.0   0.0
2   1.0   3.0   NaN

You then need to handle any NaN values using .fillna() documentation for this is here

x = x.fillna(0)
Out[3]: 
   col1  col2  col3
0   1.0   1.0   0.0
1   0.0   2.0   0.0
2   1.0   3.0   0.0

How to change datatype of multiple columns in pandas

3 Answers3

Linked