why does read_csv automatically convert all my columns read into 'object' types? I want to read 10 Gb csv (float and Int) and load it into a pandas data frame. I don't run into this issue (where all columns with numbers are converted to object types) if I read a smaller file (100 MB or less) with either panda or dask
i tried to specify dtype explicitly, manually as part of read_csv; still ended up with objects (verified after read with df.dtype)
import pandas as pd
file='D:/path/combine.csv'
data_type={'Lat':np.float32,'Long':np.float32, 'HorizontalAccuracy':np.int,'RSRP':np.int}
data=pd.read_csv(file, low_memory=False, dtype=data_type)
data.dtypes
tried to read 1st line of file and get dtypes automatically, then read file with the defined dtypes: ended up with all objects
file='D:/path/combine.csv'
col_names=pd.read_csv(file, nrows=0).columns
types_dict=data_type
types_dict.update({col:np.int64 for col in col_names if col not in types_dict})
data=pd.read_csv(file, low_memory=False, dtype=data_type)
data.dtypes
TypeError: Cannot cast array from dtype('O') to dtype('float32') according to the rule 'safe' During handling of the above exception, another exception occurred: ValueError: could not convert string to float: '\x1a'
tried read_csv with dask while explicitly specifying dtype; got error about can't convert string to float
import dask.dataframe as dd
file='D:/path/combine.csv'
data_type={'Lat':np.float32,'Long':np.float32, 'HorizontalAccuracy':np.int,'RSRP':np.int}
ddf=dd.read_csv(file, dtype=data_type)
ddf.compute()
TypeError: Cannot cast array from dtype('O') to dtype('float32') according to the rule 'safe' ValueError: could not convert string to float: 'Latitude'