It looks as if the fread
command will detect the type in a particular column and then assign the lowest type it can to that column based on what the column contains. From the fread documentation:
A sample of 1,000 rows is used to determine column types (100 rows
from 10 points). The lowest type for each column is chosen from the
ordered list: logical, integer, integer64, double, character. This
enables fread to allocate exactly the right number of rows, with
columns of the right type, up front once. The file may of course still
contain data of a higher type in rows outside the sample. In that
case, the column types are bumped mid read and the data read on
previous rows is coerced.
This means that if you have a column with mostly numeric type values it might assign the column as numeric
, but then if it finds any character
type values later on it will coerce anything read up to that point to character
type.
You can read about these type conversions here, but the long and short of it seems to be that trying to convert a character
column to numeric
for values that are not numeric will result in those values being converted to NA
, or a double might be converted to an integer, leading to a loss of precision.
You might be okay with this loss of precision, but fread
will not allow you to do this conversion using colClasses
. You might want to go in and remove non-numeric values yourself.