80

I a using the the HTML encode special characters in Sublime text to convert all the special character into their HTML code. I have a lot of accented characters in different parts of the file. So, it would be great if I could select all the special character and then use the plugin to convert all at once!

Is there a regex that helps select all special characters only?

kashive
  • 1,356
  • 2
  • 11
  • 17

3 Answers3

198

Yes.

Sublime text supports regular expression and you can select all non-ASCII (code point > 128) characters. This regex find should be enough for you:

[^\x00-\x7F]

Just search and replace.

But if you are doing manual HTML encode in the first place you are doing it wrong. Save your files as UTF-8 encoding (Sublime Text 2 default) and make sure your web server also sends out those files as UTF-8. No conversion, encoding or anything needed.

Markus Amalthea Magnuson
  • 8,415
  • 4
  • 41
  • 49
Mikko Ohtamaa
  • 82,057
  • 50
  • 264
  • 435
  • 1
    However, when coding an HTML email, using UTF-8 usually isn't an option because it's not supported in all email clients. In these cases, manual HTML encoding is necessary. – Mark Northrop Apr 19 '13 at 09:19
  • @mtnorthrop: Can you please tell when UTF-8 causes issues? Namely I am sending out tons of HTML emails and I'd like to know which kind of problems I can run into. – Mikko Ohtamaa Dec 12 '13 at 18:37
  • can't thank you enough for this...have been trying to look at a non utf-8 data file for hours trying to figure this out – Matt Kim Aug 07 '14 at 17:27
  • Great! This regex solution is not limited to sublime editor, it also works for any other editor that supports regex search – Zaphod Beeblebrox Mar 08 '22 at 10:36
15

Just as further reference (or as complement):

The Sublime Text 2/3 package, named Highlighter, can (as his name says) highlight some characters with regex...

"You can also add a custom regex for characters to highlight."

So, with this package, plus @Mikko Ohtamaa answer, we can edit the file...

highlighter.sublime-settings - User

...and include the proposed regex, (expresed here as [^\\x00-\\x7F]) to end up with something like this:

{  
    "highlighter_regex": "(\t+ +)|( +\t+)|[^\\x00-\\x7F]|[\u2026\u2018\u2019\u201c\u201d\u2013\u2014]|[\t ]+$"  
}

The result would be an automatic highlight of any "non-ASCII (code point > 128) characters" in our file.

Note, this wil not made a selection of those characters, only will highlight them to easily realize if you have any.

Community
  • 1
  • 1
gmo
  • 8,860
  • 3
  • 40
  • 51
7

Another plugin option

I recently wrote a plugin dedicated to highlighting non-ascii characters: https://github.com/TuureKaunisto/highlight-dodgy-chars

The exactly same functionality can be achieved with Highlighter but with the less generic Highlight Dodgy Chars plugin you don't need to write a regular expression, you can just list the non-ascii characters you don't wish to highlight in the settings. The European special characters are whitelisted by default.

Tuure
  • 521
  • 4
  • 12