I have this CSV file
id,adset_id,source
1,,google
2,23843814084680281,facebook
3,,google
4,23843814088700279,facebook
5,23843704830370464,facebook
My problem is when I am trying to read it with panda since I can not pass the schema panda infer the schema for adset_id
column to be float64 (because of NaN value)
So if I write this
import pandas as pd
df = pd.read_csv('/Users/test/Desktop/float.csv')
print(df)
I will get scientific notation for adset_id
result:
id adset_id source
0 1 NaN google
1 2 2.384381e+16 facebook
2 3 NaN google
3 4 2.384381e+16 facebook
4 5 2.384370e+16 facebook
I could not find any way to fix this so I tried to do a hack and convert this number to String. But in order to do that, I need to convert it to int64
first and after that convert it to string.
import pandas as pd
import numpy as np
df = pd.read_csv('/Users/test/Desktop/float.csv')
df = df.fillna({'adset_id':-1})
df['adset_id'] = df['adset_id'].astype('int64')
df['adset_id'] = df['adset_id'].astype('str')
df['adset_id'].replace('-1', np.NaN, inplace=True)
print(df)
The result is:
id adset_id source
0 1 NaN google
1 2 23843814084680280 facebook
2 3 NaN google
3 4 23843814088700280 facebook
4 5 23843704830370464 facebook
As you can see 2 of my adset_id
get rounded:
23843814084680281
-> 23843814084680280
23843814088700279
-> 23843814088700280
I just want to be able to read this CSV to panda data frame and don't get adset_id
as scientific notation, any solution would be appreciated