2

I am trying to read a CSV file using the fread command from the data.table package, but I am running into the "embedded nul in string" error.

I am in a Linux OS (Antergos), so any shell script that solves this problem is also welcome.

The CSV file I am trying to read is available in my Dropbox through this link (30MB zipped file).

The reproducible example is quite straightforward:

fread("20170630_Remuneracao.csv", encoding = 'Latin-1', dec = ',')

Error in fread("r.csv", encoding = "Latin-1", dec = ",") : 
  embedded nul in string: '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\000'

Approaches tried:

  1. sed 's/\\0//g' 20170630_Remuneracao.csv > 20170630_Remuneracao.csv as as suggested here.
    When I read the file again, it gives the same error as before.

  2. file <- "file.csv" tt <- tempfile() # or tempfile(tmpdir="/dev/shm") system(paste0("tr < ", file, " -d '\\000' >", tt)) fread(tt) From this answer. The code above solves the null string error, but introduces the new error

    Expected sep (' ') but new line or EOF ends field 28 on line 28333 when reading data

Any suggestions?
Thanks!

EDIT
After reading this answer, I suspected that I could have the same problem. So I opened the file with LibreOffice Calc and saved it again with the appropriate CSV format. In this way it works, but since it's not a proper solution, I will leave this question open, hoping for a solution that can be done programatically.

Hannon Queiroz
  • 443
  • 4
  • 22

0 Answers0