0

I want to inspect a file before processing it in R.

It is possible, that my input file is broken and contains nul values, due to a software crash. So I want my script to have a look at the data before continuing the process.

If I do a read.csv(..., skipNul = TRUE), the nuls were skipped and the script doesn't stop. But this is bad, because I don't see the lack of data.

If I do a read.csv(..., skipNul = FALSE),the nuls were skipped anyway, and I just get a warning message.

I want to count the nuls inside the file. But how can I do this?

I tried to change the encoding, but nuls are never shown when printing the dataframe.

Stefan L.
  • 65
  • 10
  • 1
    Can you share an example file? – user2974951 Feb 03 '20 at 09:05
  • You can find an example here: [file](https://webmail.freenet.de/Cloud/?shareToken=b4bf2b1a05dbfad38db89c451b90c6aa107e2d9ccdd891a55a1c54458343cfc7) – Stefan L. Feb 03 '20 at 09:51
  • Doesn't work for me. It would be better if you pasted the first few rows which are relevant of your data set into your question using `dput()`. – user2974951 Feb 03 '20 at 09:56
  • This wouldn't work. The problem is, that you cannot see nul values in your dataset. So a dump to a file wouldn't help either. But you can try to read the file with: `DF <- read.csv(file="Example.log", blank.lines.skip = TRUE, header = FALSE, skipNul = FALSE, encoding = "UTF-16", allowEscapes = TRUE)` – Stefan L. Feb 03 '20 at 10:11

1 Answers1

0

I think, I found a solution here: Removing nul characters

I can read the raw data by DF <- readBin("Example.log", raw(), file.info("Example.log")$size).

The resulting vector contains all 1 byte char values. I just had to filter for the nuls and print the length: length(DF[DF == 0])

Stefan L.
  • 65
  • 10