I am not sure I understand the behavior of fread
regarding empty strings. for instance
rawdata <- 'a,b\n"",""\nabc,2020-12-31 00:00:00'
fread(rawdata,na.strings=c("","NA"))
## a b
## 1:
## 2: abc 2020-12-31 00:00:00
I was expecting NA, in the first row. Are my assumptions flawed?
In the same line, it is possible to have full control on the colClasses
and the na.strings
at the same time?
Say I want to read columns a and b as character.
rawdata <- 'a,b\n"",""\n1,2020-12-31 00:00:00'
fread(rawdata,na.strings=c("","NA"),
colClasses=c(a="character",
b="character"))
I'm using data.table_1.13.6
update
Part of the answer has already been answered here
It seems that fread
uses a different parser that read.csv
which might result into unexpected behavior.
One solution could be to replace all empty string by NA.
see here. But I am not sure this process is faster than read_csv