4

I'm using Mac with RStudio 0.99.489 and R3.2.2. I have a csv file of 1GB, it's not exactly big but still takes around 5 min if I tried to import it with read.csv, and I have a lot files of this size so I tried fread(). From reading the previous questions, I learned that this error might be because of missing values on date (a normal entries would be like '03May1995:15:31:50' for the date column, however, where the error occurs, it looks like '05May').

I tried sed 's/\\0//g' mycsv1.csv > mycsv2.csv as mentioned in 'Embedded nul in string' error when importing csv with fread, but the same error message still pops up.

sed -i 's/\\0//g' /src/path/mycsv.csv simply doesn't work for me, the terminal reports error for this command line (I'm not very familiar with those command lines, so I don't understand the logic behind those)

I tried

file <- "file.csv"
tt <- tempfile()  # or tempfile(tmpdir="/dev/shm")
system(paste0("tr < ", file, " -d '\\000' >", tt))
fread(tt)

from 'Embedded nul in string' when importing large CSV (8 GB) with fread(), I guess it removed the entries where there is a missing value, because when I run fread(tt) R says

Error in fread(tt) : 
  Expecting 5 cols, but line 5060627 contains text after processing all cols. It is very likely that this is due to one or more fields having embedded sep=',' and/or (unescaped) '\n' characters within unbalanced unescaped quotes.

After that, I tried iconv -f utf-16 -t utf-8 myfile1.csv > myfile2.csv because it seems like this was caused by some problem with fread can't comprehend utf-16, and there might be something wrong with this command line, but it simply gives me a spread sheet with random symbols.

And I saw this

vim filename.csv

:%s/CTRL+2//g

ESC  #TO SWITCH FROM INSERT MODE

:wq   # TO SAVE THE FILE

from Error with fread in R--embedded nul in string: '\0' but after I typed in vim filename.csv, the terminal just read in the whole spreadsheet and I couldn't type in the 2nd command (:%s/CTRL+2//g), again, I don't really understand those command lines, so maybe I need to make some adjustment to my situation.

Thanks for the help!

Community
  • 1
  • 1
ttothef
  • 41
  • 1
  • 3

1 Answers1

3

try

sed -i 's/\x0//g' my_file

or

cat my_file|tr -d '\000' > new_file
repzero
  • 8,254
  • 2
  • 18
  • 40
  • for sed -i 's/\x0//g' my_file, terminal returns: sed: 1: "my_file.csv": invalid command code C. For the 2nd one, terminal returns "usage: tr [-Ccsu] string1 string2 tr [-Ccu] -d string1 tr [-Ccu] -s string1 tr [-Ccu] -ds string1 string2" and the same error message of 'embedded nul in string' still shows. – ttothef Dec 22 '15 at 03:52
  • it's a csv file, I'm using mac and I tried in terminal – ttothef Dec 22 '15 at 03:55
  • replace "my_file" with the path to your file – repzero Dec 22 '15 at 03:58
  • oh! i'm already in the directory where I stored my data, do I still need to put in the path? – ttothef Dec 22 '15 at 04:07
  • I don't know if the file is in a sub directory..what you can do is drag the file from the display window into the shell where it should be placed – repzero Dec 22 '15 at 04:10
  • I tried that, still doesn't work... and for 'cat my_file|tr -d '\000' > new_file', when I use fread(), R gives me the error message that 'Expecting 5 cols, but line 5060627 contains text after processing all cols.' I'm not sure what to do... – ttothef Dec 22 '15 at 04:20