39

I have a large file (3.5G) that I'm trying to import using data.table::fread.

It was originally created from an rpt file that was opened as text and saved as a CSV.

This has worked fine with smaller files (of the same type of data-same columns and all. This one is just for a longer timeframe and wider reach).

When I try and run

mydata <- fread("mycsv.csv")

I get the error:

Error in fread("mycsv.csv") : embedded nul in string: 'y\0e\0a\0r\0'

What does this mean?

zx8754
  • 52,746
  • 12
  • 114
  • 209
datahappy
  • 826
  • 2
  • 11
  • 29
  • 3
    It looks like UTF-16 (2 or more bytes for every character, almost always 2 bytes). For ASCII, you'll see the ASCII value and then a 0 byte. – Matt Chambers Aug 25 '14 at 22:23
  • 2
    I have had the same problem when loading RData files generated from large data sets. I regenerate the RData and the problem goes away. Seems to be caused when my RAM is maxing out. – Cyrille Oct 27 '14 at 11:35
  • This error also occurs when you try to open an RDS file, inappropriately, with `fread`. Don't ask how I know that. – Brandon Bertelsen Dec 18 '16 at 08:24

6 Answers6

22

We can remove the null terminators on the command line using something like:

sed 's/\\0//g' mycsv.csv > mycsv.csv

Or as suggested by @marbel, fread allows you to pass the sed call inside the text. Such as:

fread("sed 's/\\0//g' mycsv.csv")
zx8754
  • 52,746
  • 12
  • 114
  • 209
Robert Krzyzanowski
  • 9,294
  • 28
  • 24
  • 2
    Thanks for the reply. I'm not exactly sure how to enter that in R. One thing I realized, apparently, the error is telling me that the embedded nul is in the string 'year', which is the first word of the first line of the csv (it is the first column heading). The full error is [ Error in fread("allpcts90days1.csv") : embedded nul in string: 'y\0e\0a\0r\0'] Any idea what that's telling me? When I look at the text in a text editor (TextPad or UltraEdit, since they can handle files of this size), I don't see anything out of the ordinary. – datahappy Mar 25 '14 at 19:34
  • 1
    Your text file has characters that are reserved in ASCII for terminating strings. You need to run that from the linux command line, or if you're on Windows, Google for "how to remove special characters from file windows". – Robert Krzyzanowski Mar 25 '14 at 19:37
  • 26
    `fread` allows you to pass the sed call inside the text. Such as: `fread("sed 's/\\0//g' mycsv.csv")` – marbel Oct 08 '14 at 00:59
  • It says: `'sed' is not recognized as an internal or external command` and in addition I'm getting an "empty file error" – mRiddle Apr 03 '18 at 12:42
  • I think `sep` might be the argument @RobertKrzyzanowski referred to? – Ben Nov 16 '21 at 22:12
  • `sed` is a Unix utility, it won't work on Windows unless you have WSL installed/activated, and if you don't know what `sed` is then you probably haven't activated WSL. I don't know how to do these things on windows, either; but there should be a way. Robert suggested googling it. Otherwise, installing WSL and running that command on the Linux terminal in the relevant folder should not be too cumbersome. Trying to use it inside the `fread` call will be more difficult. – Fons MA Nov 25 '21 at 20:51
12

In this case, you can use read.csv with fileEncoding of UTF-16LE rather than fread.

read.csv("mycsv.csv",fileEncoding="UTF-16LE")

Considering your data size, using read.csv would take a couple of minutes, but I think it is not a big deal.

Fan Wang
  • 341
  • 3
  • 6
5

You can test this small function:

cleanFiles<-function(file,newfile){
  writeLines(iconv(readLines(file,skipNul = TRUE)),newfile)
}

It's work for me

xrsousa
  • 51
  • 1
  • 1
4

A non-technical way to solve this would be, to

  1. Open the problematic .csv

  2. Ctrl+A (Select all)

  3. Open new Excel sheet

  4. Right click and choose 'Paste as values'

  5. Save and use this file in place of original one.

Worked for me, and doesn't take much time.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Pree
  • 77
  • 10
2

If you are seeing NUL (x00) characters in an ASCII file you can do this: data.table::fread(text = readLines(pathIn, skipNul = T), ...)

dpel
  • 1,954
  • 1
  • 21
  • 31
Jim Cutler
  • 63
  • 4
1

I ran into a similar error, sharing in case others run into the same issue -

  embedded nul in string: '\0HA\xa8S\001\0\0\0\xd8@\xa8S\001\0\0\0h@\xa8S\001\0\0\0\xf8?\xa8S\001\0\0\0\x88'
Calls: as.data.table -> fread

The cause of this ended up being different column lengths where my first column (headers) were shorter than the rest.

Rachel
  • 143
  • 2
  • 7