I have a some data to convert to dataframe. Say, the below data for example.
df_raw = [
("Madhan", 0, 9.34),
("Kumar", None, 7.6)
]
When i convert this to a pandas dataframe, the int
column is automatically getting converted to float
.
pd.DataFrame(df_raw)
0 1 2
0 Madhan 0.0 9.34
1 Kumar NaN 7.60
How do i avoid this?
What i tried:
It's actually fine for me as long as the actual text of the elements in the dataframe doesn't change. So i tried defining the dataframe with column types as string
or pd.StringDtype()
, none of which work and give the same result.
pd.DataFrame(df_raw, dtype = str)
0 1 2
0 Madhan 0.0 9.34
1 Kumar NaN 7.6
pd.DataFrame(df_raw, dtype = pd.StringDtype())
0 1 2
0 Madhan 0.0 9.34
1 Kumar <NA> 7.6
Also, don't tell me to convert the integer columns to nullable int like pd.Int64Dtype()
or Int64
because i wouldn't know which columns are integer columns as this is part of an automation.
Also, I can't go and change each element as string datatype because sometimes the dataframe might be huge and doing this might be take time.
Edit: convert_dtypes also doesn't work if the number is large, as shown.
df_raw = [
("Madhan", 5, 9.34, None),
("Kumar", None, 7.6, 34534543454)
]
pd.DataFrame(df_raw).convert_dtypes()
0 1 2 3
0 Madhan 5 9.34 <NA>
1 Kumar <NA> 7.6 34534543454.0