8

Problem

So I have this large text file which contains <0x00> characters (see picture below).

Sublime Text 3 with <0x00> including replace option

As you can see on the picture above, I have tryed to remove those characters with a regular expression \x00. Besides that, I have also tryed \0 and \00 with no success.

However, when I try to replace these characters in Sublime Text, a pop-up shows that these <0x00> characters indeed have been found (see picture below), so far so good.

occurrences found

Unfortunately, when I click the "replace" button the characters are not replaced.

Question

How can I get rid of these <0x00> characters?

p.s. It is important to mention that I cannot do a search on "0" since this text file contains zero's, which I would like to remain.

Progress update #1

I have managed to copy a <0x00> char into the "find" search box (see picture below).

However, even when I try to replace this character with an empty character inside the text file no changes occur unfortunately.

progress update 1

Progress update #2 (solution)

Without the helping hand of @00 I wouldn't find the answer to this problem, thank you!

Explanation

The file was encoded in UTF-16, but I assumed it was UTF-8. The file was opened in BOM UTF-8, which was exactly the reason why I was not able to delete the <0x00> (NUL) characters in Sublime Text 3.

Solution

Execute in 'bash' or in a 'terminal' the following command:

sed -i 's/\x0//g' [textfile_name].txt

Kamuffel
  • 592
  • 1
  • 6
  • 17
  • Are those literally `<0x00>` or just markers denoting ␀ characters? – James Parsons Jul 20 '18 at 19:58
  • 1
    @00 I am not sure if I am right, but I do think that they are markers, since they are grayed out. However, when I click on such a char it selects the whole character, instead of just a "single" char of `<0x00>`. – Kamuffel Jul 20 '18 at 20:02
  • It sounds like they are NULL's. Your file must have been corrupted. Unfortunately if the `\0` regex doesn't seem to be working you may have to get creative and use an external program to remove the, or pull from backup / source control – James Parsons Jul 20 '18 at 20:14
  • @00 I am sure now that they are NULL's, since 0x00 in hex denotes NULL. However, you say that my file may be corrupted. How can it be that I can replace "regular" text for e.g. the word "english" to "spanish", for some strange reason this is possible, but replacing `<0x00>` is not? – Kamuffel Jul 20 '18 at 20:25
  • 1
    I've managed to get a find / replace to work and wrote it up in an answer. Try a regex of `\0` again – James Parsons Jul 20 '18 at 20:28
  • @Kamuffel - not only does your solution not work for me, but it doesn't indicate how one might approach this in a file with variances from your original use case. For example the characters `<0x0c>` `<0x18>` need to be identified individually. – jml Nov 23 '19 at 03:58

1 Answers1

10

Ok, I've tried this out myself and it seems that a regular expressions will work. Make sure you have the regex option selection (highlighted in image) and use a regex of \0:

Sublime

Now just make sure you have nothing in the replace filed and hit Replace All. The NUL characters should be gone.

NOTE

While reading around, it seems that you have a NUL after every other character which might indicate that the file is actually UTF-16 (and if this is the case, you do not want to remove them) and would need to be reloaded as such. If switching to UTF-16 and my above solution do not work, this thread may be of use to you.

James Parsons
  • 6,097
  • 12
  • 68
  • 108
  • Thank you for your answer, your solution didn't work indeed. However, you did mention that the file is UTF-16, which is true! Thanks for noting that and referring to the other thread, which contained the solution I needed. I got it fixed with the following command: `sed -i 's/\x0//g' [textfile_name].txt` – Kamuffel Jul 20 '18 at 20:44
  • No problem, happy to have helped! – James Parsons Jul 20 '18 at 20:46
  • @Kamuffel - this solution worked for me and is the appropriate answer per your question. Please mark correct. – jml Dec 12 '19 at 21:03
  • Hey, I don't know, if this changed with a newer version of Sublime, but I had to mark RegExp search, like you explained and then look for `\00` or respectively `\01`, etc. Searching for `\0` did not work for me. – Ueffes Sep 30 '21 at 07:10
  • btw, can anyone explain what this is...? and what do we call it? "weird characters in my text file" isn't enough... – Ulf Gjerdingen Jun 01 '22 at 20:57
  • @UlfGjerdingen see the note about encoding, its likely OP was opening a UTF-16 encoded file as something else – James Parsons Jun 06 '22 at 12:54