I am trying to read a CSV file using the fread
command from the data.table
package, but I am running into the "embedded nul in string" error.
I am in a Linux OS (Antergos), so any shell script that solves this problem is also welcome.
The CSV file I am trying to read is available in my Dropbox through this link (30MB zipped file).
The reproducible example is quite straightforward:
fread("20170630_Remuneracao.csv", encoding = 'Latin-1', dec = ',')
Error in fread("r.csv", encoding = "Latin-1", dec = ",") :
embedded nul in string: '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\000'
Approaches tried:
sed 's/\\0//g' 20170630_Remuneracao.csv > 20170630_Remuneracao.csv
as as suggested here.
When I read the file again, it gives the same error as before.file <- "file.csv" tt <- tempfile() # or tempfile(tmpdir="/dev/shm") system(paste0("tr < ", file, " -d '\\000' >", tt)) fread(tt)
From this answer. The code above solves the null string error, but introduces the new errorExpected sep (' ') but new line or EOF ends field 28 on line 28333 when reading data
Any suggestions?
Thanks!
EDIT
After reading this answer, I suspected that I could have the same problem. So I opened the file with LibreOffice Calc and saved it again with the appropriate CSV format. In this way it works, but since it's not a proper solution, I will leave this question open, hoping for a solution that can be done programatically.