I have an excel that has columns with values like 1.32131**
, among columns of dtype string
. As a result the dtypes of these columns in the dataframe are object
. I have cleaned the asterisks from the dataframe and now I need to convert the dtypes of these columns to float64
. I am aware of ways to do it if I define the columns that need to be changed or the desired dtypes for each column (like the functions mentioned here), but I have too many columns to use such solutions. Thus I am looking for a more efficient and clean way.
For example, if I wanted to convert to int64
I would use convert_dtypes()
, but it seems that it doesn't support floats and it returns these columns with object
dtype.
Then, if possible, convert to StringDtype, BooleanDtype or an appropriate integer extension type, otherwise leave as object.
Right now I am using the following script that works but I think it's to big for its purpose and it a bit slow.
# Create df and clean it
# note that the data exist in an excel normally and the dict is only for reproducibility purposes
dict = {'Name':['BPh1', 'BPh2', 'BPh3', 'BPh4', 'BPh5', 'BPh6', 'BPh7'], 'BBB':['2.00755**', '2.7766**', '0.490127**','0.490127**', '0.87667**', '0.899189**', '3.084**'], 'Buffer_solubility_mg_L':['0.00112934**','0.000798559**', '0.000218191**', '0.000122249**', '0.00382848**', '0.00109165**', '0.000665366**'], 'CYP_2C19_inhibition':['Inhibitor','Inhibitor','Non','Non','Inhibitor','Inhibitor',
'Inhibitor']}
ss = pd.DataFrame(dict).replace("\*",'',regex=True)
# Convert dtype to float when possible
for col in ss.columns[1:]:
print(col,'\n',ss[col].dtypes)
try:
ss[col] = pd.to_numeric(ss[col])
except:
pass
print(ss[col].dtypes,'\n')
Is there a cleaner way to do this conversion?