58

I have a file with some non-printable characters that come up as ^C or ^B, I want to find and replace those characters, how do I go about doing that?

Charles Ma
  • 47,141
  • 22
  • 87
  • 101

7 Answers7

90

Removing control symbols only:

:%s/[[:cntrl:]]//g

Removing non-printable characters (note that in versions prior to ~8.1.1 this removes non-ASCII characters also):

:%s/[^[:print:]]//g

The difference between them could be seen if you have some non-printable-non-control characters, e.g. zero-width space:

enter image description here

john c. j.
  • 725
  • 5
  • 28
  • 81
lincz
  • 901
  • 1
  • 6
  • 3
  • 1
    At least until vim 7.3 [:print:] only matches ASCII printable characters (edited the answer to alert readers about this fact) – ndemou Feb 10 '15 at 22:00
  • @ndemou This is tricky, with the [ ] around the [:print:] the ^ should invert the match and return any non-printable. Or perhaps that was your edit? – dragon788 Mar 02 '16 at 00:43
  • @dragon788, yes I was aware of how it works when I wrote my comment. Try the 2nd regex on text with printable Unicode characters outside the ASCII table to understand my comment (it will remove the Unicode characters). – ndemou Mar 02 '16 at 12:16
  • @ndemou Have it changed in recent versions of Vim? I don't see any difference in the behavior of these two regexs in 8.1.1. Here is the screenshot: https://imgur.com/a/NHS5EHr . Also, it doesn't remove non-ASCII characters, e.g. Chinese and Russian (the last part isn't something bad, just to point the difference). – john c. j. Feb 03 '19 at 20:26
  • @john-c-j I've just confirmed your observation. It was a bug anyway to consider printable Unicode characters as non printable. -- updated the answer (& waiting for peer review) – ndemou Feb 04 '19 at 13:36
  • @ndemou But what is the difference between current behavior of these two regular expressions? When I tested them, they do exactly the same things, the result is shown in my screenshot posted above. – john c. j. Feb 04 '19 at 20:32
  • @john-c-j I can't think of a difference. However why do you think that there should be one? `[[:print:]]` and `[[:cntrl:]]` are by definition complementary so it's natural that `[^[:print:]] == [[:cntrl:]]` and `[^[:cntrl:]] == [[:print:]]` (remember that `[^[:xyz:]]` means any character except `[[:xyz:]]`) . – ndemou Feb 05 '19 at 08:26
  • @ndemou "However why do you think that there should be one?" -- Well, because as stated in the answer, the first regex removes *control symbols*, whereas the second removes *non-printable characters*. As a corollary it could be said that there should be some difference between control symbols and non-printable characters. – john c. j. Feb 05 '19 at 10:46
  • @john-c-j sure but I can't think of a non-printable character that is not a control character. I'm maybe too tired though :-) – ndemou Feb 05 '19 at 15:05
  • @ndemou Well, I wondered for some example which will easily show the difference between them. And I found it :-) The difference between them could be seen if you have some non-printable-non-control characters, e.g. [zero-width space](https://en.wikipedia.org/wiki/Zero-width_space): https://imgur.com/a/lqG0SlX. I edited the answer, currently awaiting for peer review. – john c. j. Feb 05 '19 at 15:47
  • @ndemou See here: https://vi.stackexchange.com/questions/18807 if you need to automate it without losing cursor position. – john c. j. Feb 07 '19 at 22:25
  • I like that this solution answers the problem stated in the question's title, not just the specific characters mentioned in the body of the question. – Mr. Lance E Sloan Apr 27 '21 at 15:03
49

Say you want to replace ^C with C:

:%s/CtrlVC/C/g

Where CtrlVC means type V then C while holding Ctrl pressed.

CtrlV lets you enter control characters.

ndemou
  • 4,691
  • 2
  • 30
  • 33
ars
  • 120,335
  • 23
  • 147
  • 134
15

Try this after saving your file in vim (assuming you are in Linux environment)

:%!tr -cd '[:print:]\n'
ticktock
  • 627
  • 7
  • 12
  • 3
    @JamesAndino: `:%` filters all lines using the external (`!`) programm `tr`, which _removes_ (`-d`) all characters that are _not_ (`-c`) _printable_ (`[:print:]`) or _newline_ (`\n`). – quasimodo Feb 07 '14 at 18:08
  • 2
    This isn't Unicode friendly, as it is a POSIX character class (http://en.wikipedia.org/wiki/Regular_expression#Character_classes). So if you have YAML with data like 你好, `tr` will strip the Unicode data when using `[:print:]`. – atp Feb 10 '14 at 01:40
10

None of the answers here using Vim's control characters worked for me. I had to enter a unicode range.

:%s/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F]//g

That unicode range was found on this other post: https://stackoverflow.com/a/8171868/231914

Community
  • 1
  • 1
Dalin
  • 3,012
  • 1
  • 21
  • 21
  • Because TAB is considered not-printable, So these [[:cntrl:]] and [^[:print:]] match TAB (0x9, C-I) – mosh Jan 15 '17 at 03:09
5

You can use:

:%s/^C//g

To get the ^C hold the control key, press V then C (Both while holding the control key) and the ^C will appear. This will find all occurrences and replace them with nothing.

To remove both ^C and ^B you can do:

:%s/^C\|^B//g
codaddict
  • 445,704
  • 82
  • 492
  • 529
5

You can use the CTRL-V prefix to enter them, or if they're not easily typeable, yank and insert them using CTRL-R ".

Pi Delport
  • 10,356
  • 3
  • 36
  • 50
5

An option not mentioned in other answers.

Delete a specific unicode character with a long hex code, e.g. <200b>:

:%s/\%U200b//g
Gebb
  • 6,371
  • 3
  • 44
  • 56