178

I searched a lot, but nowhere is it written how to remove non-ASCII characters from Notepad++.

I need to know what command to write in find and replace (with picture it would be great).

  • If I want to make a white-list and bookmark all the ASCII words/lines so non-ASCII lines would be unmarked

  • If the file is quite large and can't select all the ASCII lines and just want to select the lines containing non-ASCII characters...

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Texh
  • 1,823
  • 2
  • 12
  • 8

9 Answers9

327

This expression will search for non-ASCII values:

[^\x00-\x7F]+

Tick off 'Search Mode = Regular expression', and click Find Next.

Source: Regex any ASCII character

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
ProGM
  • 6,949
  • 4
  • 33
  • 52
69

In Notepad++, if you go to menu SearchFind characters in rangeNon-ASCII Characters (128-255) you can then step through the document to each non-ASCII character.

Be sure to tick off "Wrap around" if you want to loop in the document for all non-ASCII characters.

screenshot "Find in Range"

When you press find it selects the character. Then go to the Edit menu and pick Replace, and the "find" box will be filled with the current selection, which will be the character you found.

Then you can do the rest of the find/replace in the normal dialog.

Darren
  • 9,014
  • 2
  • 39
  • 50
Anon Y. Mous
  • 691
  • 5
  • 2
  • 5
    This works well, but doesn't show all results in a list and no "replace" option – Alex Jul 09 '14 at 19:42
  • 2
    Neat... because I always forget the regex for the non-ASCII and have to Google it each time to go back to this page :) – Jean-Francois T. Oct 31 '19 at 01:52
  • 1
    So the trick with this is when you press find here it selects the character. Then you just go to the Edit menu and pick Replace, and Notepad++ always fills the "find" box in with the current selection, which will be the character you found. So you can do the rest of the find/replace in the normal dialog. – Jason C Dec 01 '21 at 02:03
32

In addition to the answer by ProGM, in case you see characters in boxes like NUL or ACK and want to get rid of them, those are ASCII control characters (0 to 31), you can find them with the following expression and remove them:

[\x00-\x1F]+

In order to remove all non-ASCII AND ASCII control characters, you should remove all characters matching this regex:

[^\x1F-\x7F]+
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
brunorey
  • 2,135
  • 1
  • 18
  • 26
  • Values from `\x00` and `\x1F` are already matched in the answer by ProGM. – Unihedron Jan 17 '15 at 17:35
  • 2
    They're matched as values you'd like to keep. I was just suggesting this in case you want to get rid of them. – brunorey Jan 19 '15 at 13:47
  • The last example should begin at 20 to exclude the unit separator character. Maybe exclude 7F as well as it's a control character too. – fgb Jan 21 '16 at 18:21
  • Brilliant! I removed all pesky non-ASCII characters using the qdap R package using: `mgsub("[^\x1F-\x7F]+", "", text_vector, fixed = FALSE)` – Pablo Adames Jun 26 '19 at 22:50
30

To remove all non-ASCII characters, you can use following replacement: [^\x00-\x7F]+

Removing non-ASCII

To highlight characters, I recommend using the Mark function in the search window: this highlights non-ASCII characters and put a bookmark in the lines containing one of them

If you want to highlight and put a bookmark on the ASCII characters instead, you can use the regex [\x00-\x7F] to do so.

Highlighting Non-ASCII

Cheers

Jean-Francois T.
  • 11,549
  • 7
  • 68
  • 107
4

To keep new lines:

  1. First select a character for new line... I used #.
  2. Select replace option, extended.
  3. input \n replace with #
  4. Hit Replace All

Next:

  1. Select Replace option Regular Expression.
  2. Input this : [^\x20-\x7E]+
  3. Keep Replace With Empty
  4. Hit Replace All

Now, Select Replace option Extended and Replace # with \n

:) now, you have a clean ASCII file ;)

TooGeeky
  • 133
  • 2
  • 10
3

Another good trick is to go into UTF8 mode in your editor so that you can actually see these funny characters and delete them yourself.

Gidon Wise
  • 1,896
  • 1
  • 11
  • 11
3

Another way...

  1. Install the Text FX plugin if you don't have it already
  2. Go to the TextFX menu option -> zap all non printable characters to #. It will replace all invalid chars with 3 # symbols
  3. Go to Find/Replace and look for ###. Replace it with a space.

This is nice if you can't remember the regex or don't care to look it up. But the regex mentioned by others is a nice solution as well.

goku_da_master
  • 4,257
  • 1
  • 41
  • 43
  • Zapping all characters replaces all type of punctuation marks with ###. The solution I would expect is: Replacing “ & ” with ". Replacing ‘ & ’ with '. etc. – Kasim Husaini Apr 19 '17 at 06:40
  • It works fine, however, the tool replaces funny chars with one # char and not three. please take note. – Raghav Aug 16 '17 at 13:58
  • 1
    The Text FX plugin is deprecated and may not even be readily available anymore. See e.g. *[TextFX's Future](http://docs.notepad-plus-plus.org/index.php/TextFX%27s_Future)* - *"When the list grows long enough, it will become practical to bid farewell to an aging workhorse that has served the community well."* – Peter Mortensen Aug 07 '18 at 22:08
  • The name has now been [highjacked by Google](https://duckduckgo.com/l/?uddg=https%3A%2F%2Fnews.ycombinator.com%2Fitem%3Fid%3D37161441&rut=90224086e5c465635d790f8dd1d882eface9a4b473c4de90c4cc4b4c65c84b16) (2023). – Peter Mortensen Aug 20 '23 at 11:54
0

Click on View/Show Symbol/Show All Character - to show the [SOH] characters in the file Click on the [SOH] symbol in the file CTRL=H to bring up the replace Leave the 'Find What:' as is Change the 'Replace with:' to the character of your choosing (comma,semicolon, other...) Click 'Replace All' Done and done!

RipVduB
  • 1
  • 1
  • Do you **really** want to do that for **ALL** non ASCII characters? They are thousands! – Toto Feb 11 '21 at 17:41
0

In addition to Steffen Winkler:

[\x00-\x08\x0B-\x0C\x0E-\x1F]+

Ignores \r \n AND \t (carriage return, linefeed, tab)

4b0
  • 21,981
  • 30
  • 95
  • 142
michibr81
  • 46
  • 3