I don't know powershell enough to fix that, but you can use sed
or tr
to replace the nuls in the files. The tr
and sed
utilities are available by default on most (all?) unix-like OSes including macos. For windows, they are included in Rtools35 and Rtools40.
If you do not find it with Sys.which("tr")
, then you may need to include the full path to the respective utility. Assuming Rtools is installed on the root c:/
, then something like
- Rtools35:
c:/Rtools/bin/tr.exe
- Rtools40:
c:/Rtools40/usr/bin/tr.exe
They are also included in Git-for-Windows as /usr/bin/tr.exe
and /usr/bin/sed.exe
within git-bash. (On the file-system, they are likely under c:/Program Files/Git/usr/bin/
.)
(Same locations for sed
.)
I should note that I'm doing this through R's system2
as a convenience only. If you're comfortable enough on the bash command line, then this is just as easy to perform there instead.
data generation
I don't know where the nuls are in your file, so I'll assume that they are record (line) terminators. That is, in most files you'll see each line ending with \n
or \r\n
, but for this example I'll replace the \n
with \0
(nul).
charToRaw("a|b\nhello|world")
# [1] 61 7c 62 0a 68 65 6c 6c 6f 7c 77 6f 72 6c 64
ch <- charToRaw("a|b\nhello|world")
ch[ch == charToRaw("\n")] <- as.raw(0)
ch
# [1] 61 7c 62 00 68 65 6c 6c 6f 7c 77 6f 72 6c 64
writeBin(ch, "raw.txt")
readLines("raw.txt")
# Warning in readLines("raw.txt") :
# line 1 appears to contain an embedded nul
# Warning in readLines("raw.txt") :
# incomplete final line found on 'raw.txt'
# [1] "a|b"
The nul is a problem (as intended), so we don't see anything after the embedded nul.
tr
tr
doesn't like doing things in place, so this takes as input the original file and generates a new file. If file-size and disk space is a concern, then perhaps sed
would be preferred.
system2("tr", c("\\0", "\\n"), stdin = "raw.txt", stdout = "raw2.txt")
readLines("raw2.txt")
# Warning in readLines("raw2.txt") :
# incomplete final line found on 'raw2.txt'
# [1] "a|b" "hello|world"
(That warning is safe to ignore here.)
sed
sed
can optionally work in-place with the -i
argument. (Without it, it can operate the same as tr
: generate a new file based on the original.)
system2("sed", c("-i", "s/\\x0/\\n/g", "raw.txt"))
readLines("raw.txt")
# Warning in readLines("raw.txt") :
# incomplete final line found on 'raw.txt'
# [1] "a|b" "hello|world"
(That warning is safe to ignore here.)
other than record-terminator
If the nul is not the record terminator (\n
-like) character, than you have some options:
Replace the \0
character with something meaning, such as Z
(stupid, but you get the point). This should use the above commands as-is, replacing the \\n
with your character of choice. (tr
will require a single-character, sed
can replace it with multiple characters if you like.)
Delete the \0
completely, in which case you can use tr -d '\0'
and sed -i -e 's/\x0//g'
(translated into R's system2
calls above).