2

I have a set of files which have null bytes roughly at every second byte slot. Further, they have the byte sequence FF FE at the beginning, which needs to be removed. I was given the files (they are actually router configuration files, so plain text), and I have no idea how those bytes got in there. However, the file looks like this, which I found for example here or here, :

FF FE 65 00 6E 00 61 00 62 6C 00 6C 65 00 0D 00
0A 00 63 00 6F 00 6E 00 66 00 69 00 37 00 75 00
72 00 65 00 20 00 74 00 65 00 72 00 6D 00 69 00

Think you can imagine how it goes on. I tried various things to remove the null bytes and the first 2 bytes as well:

  • sed -i.bak 's/\x00//g' R2.txt does nothing.
  • LC_ALL=C sed -i.bak 's/\x00//g' R2.txt does nothing.
  • LC_ALL=C tr < R2.txt -d '\000' > R2-test.txt works, but is not in line.
  • LC_ALL=C sed -i.bak $'s/\x00//g' R2.txt complains about sed: 1: "s/": unterminated substitute pattern

So... my question is, how to remove the null bytes the first two bytes (FF FE) from this file, inline, on OSX?

Thanks.

Community
  • 1
  • 1
Daniel
  • 1,398
  • 4
  • 20
  • 47
  • Can you please open the file in vi and tell me what you are seeing "check for this sign ^@ ". also you can do :set list to show all invisible characters. – z atef Sep 23 '16 at 18:51
  • also see suggestions here : http://stackoverflow.com/questions/2398393/identifying-and-removing-null-characters-in-unix – z atef Sep 23 '16 at 18:52
  • Interestingly, vi does show no such characters. It shows ^M at the end of every line, which however does not explain the problem. vi however shows [converted] at the bottom. – Daniel Sep 23 '16 at 19:00
  • first thing to do =>> open the file in vi(m) and do :%s//\r/g so that you can get ride of windows carriage return to a *nix ones . – z atef Sep 23 '16 at 19:07
  • is your file showing as a one line – z atef Sep 23 '16 at 19:08
  • No, its not, there are newlines in between. Interestingly, on a Debian system, the sed examples work great. – Daniel Sep 23 '16 at 19:15
  • soemone suggested : A large number of unwanted NUL characters, say one every other byte, indicates that the file is encoded in UTF-16 and that you should use iconv to convert it to UTF-8. so you may want to run the file through iconv. – z atef Sep 23 '16 at 19:27
  • Interesting idea. `iconv -f UTF-8 -t UTF-16 R3.txt` results in `iconv: R3.txt:1:0: cannot convert` – Daniel Sep 23 '16 at 19:33
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/124070/discussion-between-z-and-waza-ari). – z atef Sep 23 '16 at 19:35

1 Answers1

3

you said " null bytes roughly at every second byte slot". This is a clear indication of the file being encoded with "charset=utf-16le" and you will need to convert it to utf-8 , ascii ..etc. try something like the below:

iconv -f "current-encoding"  -t "desired-encoding" infile.txt > outfile.txt 
z atef
  • 7,138
  • 3
  • 55
  • 50
  • 1
    Thanks for your help in chat, it was actually utf-16le. On OS X, I was not able to use iconv to convert the file. It worked using Debian, though. – Daniel Sep 23 '16 at 20:04