Convert dataframe columns of object type to float

Question

I want to convert all the non float type columns of my dataframe to float ,is there any way i can do it .It would be great if i can do it in One Go . Below is the type

longitude          -    float64 
latitude          -     float64
housing_median_age   -  float64
total_rooms          -  float64
total_bedrooms       -   object
population           -  float64
households            - float64
median_income         - float64
rooms_per_household   - float64
category_<1H OCEAN    -   uint8
category_INLAND        -  uint8
category_ISLAND        -  uint8
category_NEAR BAY     -   uint8
category_NEAR OCEAN    -  uint8

Below is the snippet of my code

import pandas as pd
import numpy as np 
from sklearn.model_selection import KFold

df = pd.DataFrame(housing)
df['ocean_proximity'] = pd.Categorical(df['ocean_proximity']) #type casting 
dfDummies = pd.get_dummies(df['ocean_proximity'], prefix = 'category' )
df = pd.concat([df, dfDummies], axis=1)
print df.head()
housingdata = df
hf = housingdata.drop(['median_house_value','ocean_proximity'], axis=1)
hl = housingdata[['median_house_value']]
hf.fillna(hf.mean,inplace = True)
hl.fillna(hf.mean,inplace = True)

check out [this][1] thread: https://stackoverflow.com/questions/15891038/change-data-type-of-columns-in-pandas — Julia, Jul 01 '18 at 02:03

jpp · Accepted Answer · 2018-07-01T02:30:22.917

22

A quick and easy method, if you don't need specific control over downcasting or error-handling, is to use df = df.astype(float).

For more control, you can use pd.DataFrame.select_dtypes to select columns by dtype. Then use pd.to_numeric on a subset of columns.

Setup

df = pd.DataFrame([['656', 341.341, 4535],
                   ['545', 4325.132, 562]],
                  columns=['col1', 'col2', 'col3'])

print(df.dtypes)

col1     object
col2    float64
col3      int64
dtype: object

Solution

cols = df.select_dtypes(exclude=['float']).columns

df[cols] = df[cols].apply(pd.to_numeric, downcast='float', errors='coerce')

Result

print(df.dtypes)

col1    float32
col2    float64
col3    float32
dtype: object

print(df)

    col1      col2    col3
0  656.0   341.341  4535.0
1  545.0  4325.132   562.0

edited Jul 01 '18 at 02:30

answered Jul 01 '18 at 02:02

jpp

159,742
34
281
339

1

thanks for the help but i tried and its not changing. – avik Jul 01 '18 at 02:11
cols = hf.select_dtypes(exclude=['float']).columns ///// hf[cols] = hf[cols].apply(pd.to_numeric, downcast='float', errors='coerce') – avik Jul 01 '18 at 02:12
@avik, Can't reproduce. You should define precisely a dataframe where this solution doesn't work, e.g. define `pd.DataFrame(...)`. – jpp Jul 01 '18 at 02:22
3

why not just `df.astype(float) ??` @jpp – Pyd Jul 01 '18 at 02:26
1

@pyd, That would work too, I've updated with this suggestion. `pd.to_numeric` has more control, you can downcast where possible and can specify what happens with errors. – jpp Jul 01 '18 at 02:27
@pyd when i used hf.astype(float) , the error was float argument must be string or number.. – avik Jul 01 '18 at 17:04
@avik, Then use the `pd.to_numeric` method. Unfortunately, I can't help debug your problem unless you can provide a [mcve]. E.g. build a *minimal* dataframe from scratch `pd.DataFrame(....)` to demonstrate your problem. – jpp Jul 01 '18 at 17:09
@pyd .thanks for your help ..i was able to convert all the columns except the one in object type that is total_bedrooms. ex code i used.: hf['category_NEAR OCEAN'] = hf['category_NEAR OCEAN'].astype(float) – avik Jul 01 '18 at 17:51
also what i found is that when i use "hf.fillna(hf.mean,inplace = True) that converts the datatype of that total_bedrooms to objects – avik Jul 01 '18 at 18:11
1

can you create a new question with your actual df and expected df – Pyd Jul 02 '18 at 04:04
thank you , the error as due to me using ,hf.fillna(hf.mean,inplace = True) it should be hf.fillna(hf.mean(),inplace = True) – avik Jul 02 '18 at 17:24
pd.to_numeric didn't work for me. as_type(float) did. – DDR Oct 01 '19 at 07:27

score 0 · Answer 2 · answered May 12 '21 at 22:03

0

Enumerate Convert to numeric and insert to a new dataframe

New_DataFrame = pd.DataFrame()
x = {New_DataFrame.insert(i, name, pd.to_numeric(df[name], errors = "coerce"), True) if(df[name].dtype.name=='object') else New_DataFrame.insert(i, name, df[name], True) for i, name in enumerate(df.columns)}
print(New_DataFrame.head())`

answered May 12 '21 at 22:03

balaji dileep kumar

622
7
12

downcast='float' add for your required type – balaji dileep kumar May 12 '21 at 22:07

Convert dataframe columns of object type to float

Below is the snippet of my code

2 Answers2

Linked