How to change dtype of each dataframe column to float if it best fits?

Question

I have an excel that has columns with values like 1.32131**, among columns of dtype string. As a result the dtypes of these columns in the dataframe are object. I have cleaned the asterisks from the dataframe and now I need to convert the dtypes of these columns to float64. I am aware of ways to do it if I define the columns that need to be changed or the desired dtypes for each column (like the functions mentioned here), but I have too many columns to use such solutions. Thus I am looking for a more efficient and clean way.

For example, if I wanted to convert to int64 I would use convert_dtypes(), but it seems that it doesn't support floats and it returns these columns with object dtype.

Then, if possible, convert to StringDtype, BooleanDtype or an appropriate integer extension type, otherwise leave as object.

Right now I am using the following script that works but I think it's to big for its purpose and it a bit slow.

# Create df and clean it
# note that the data exist in an excel normally and the dict is only for reproducibility purposes
dict = {'Name':['BPh1', 'BPh2', 'BPh3', 'BPh4', 'BPh5', 'BPh6', 'BPh7'], 'BBB':['2.00755**', '2.7766**', '0.490127**','0.490127**', '0.87667**', '0.899189**', '3.084**'], 'Buffer_solubility_mg_L':['0.00112934**','0.000798559**', '0.000218191**', '0.000122249**', '0.00382848**', '0.00109165**', '0.000665366**'], 'CYP_2C19_inhibition':['Inhibitor','Inhibitor','Non','Non','Inhibitor','Inhibitor',
'Inhibitor']}

ss =  pd.DataFrame(dict).replace("\*",'',regex=True)

# Convert dtype to float when possible
for col in ss.columns[1:]:
    print(col,'\n',ss[col].dtypes)
    try:
        ss[col] = pd.to_numeric(ss[col])
    except:
        pass
    print(ss[col].dtypes,'\n')

Is there a cleaner way to do this conversion?

Does this answer your question? [Change column type in pandas](https://stackoverflow.com/questions/15891038/change-column-type-in-pandas) -- I mean `astype()` or `infer_objects()` in the first answer that you haven't tried. — Bill Huang, Nov 25 '20 at 20:14
I dont think so because I have to define the columns or the dtypes in case of `astype()` or I have to already have floats in object dtype in the case of `infer_objects()` — Charalamm, Nov 25 '20 at 20:28

score 0 · Answer 1 · answered Nov 27 '20 at 08:49

0

I'd change/clean the values before creating the dataframe, that way you're not first creating one, and then converting it to something else (might save a little bit of time as well). The advantage is that you can do it in a single line. I don't think you can get much faster than this, given the input data that you have to work with.

import pandas

# Create df and clean it
dict = {'Name':['BPh1', 'BPh2', 'BPh3', 'BPh4', 'BPh5', 'BPh6', 'BPh7'], 'BBB':['2.00755**', '2.7766**', '0.490127**','0.490127**', '0.87667**', '0.899189**', '3.084**'], 'Buffer_solubility_mg_L':['0.00112934**','0.000798559**', '0.000218191**', '0.000122249**', '0.00382848**', '0.00109165**', '0.000665366**'], 'CYP_2C19_inhibition':['Inhibitor','Inhibitor','Non','Non','Inhibitor','Inhibitor',
'Inhibitor']}

# Perform the conversion on creation
df = pandas.DataFrame(
    {
        col: pandas.to_numeric([v.replace("*", "") for v in values], errors="ignore")
        for col, values in dict.items()
    }
)

answered Nov 27 '20 at 08:49

Gijs Wobben

1,974
1
10
13

thanks for answering. do you think its possible with an excel? my data are in an excel. the dict exist only for reproducibility purposes – Charalamm Nov 27 '20 at 11:58
Why not? Just read the dataframe from your Excel file: `dict = pandas.read_excel("./myexcelfile.xlsx").to_dict(orient="rows")` – Gijs Wobben Nov 27 '20 at 12:08
and then recreate the dataframe from the newly created dict? – Charalamm Nov 27 '20 at 12:29
Yep. Of course it's possible to do some of this stuff also within Excel (e.g. find all and replace) – Gijs Wobben Nov 27 '20 at 12:46

How to change dtype of each dataframe column to float if it best fits?

1 Answers1