You can use a regex like:
(?=\p{L})[^a-zA-Z]
to find the characters with diacritics.
(?=\p{L})
positive lookahead to ensure the next character is a Unicode letter
[^a-zA-Z]
negative character class to exclude letters without diacritics.
This is necessary because Sublime Text (or, more specifically, the Boost regex engine it uses for Find and Replace) doesn't support \p{M}
. See http://www.regular-expressions.info/unicode.html for more information on what the \p
meta character does.
For replacing, unfortunately you will need to specify the characters to replace manually. To make it harder, ST doesn't seem to support the POSIX character equivalents, nor does it support conditionals in the replacement, which would allow you to do the find and replace in one pass, using capture groups.
Therefore, you would need to use multiple find expressions like:
[ÀÁÂÃÄÅ]
replace with
A
and
[àáâãäå]
replace with
a
etc.
which is a lot of manual work.
A much easier/quicker/less-manual-work approach would be to use the Python API instead of regex:
- Tools menu -> Developer -> New Plugin
Paste in the following:
import sublime
import sublime_plugin
import unicodedata
class RemoveNonAsciiCharsCommand(sublime_plugin.TextCommand):
def run(self, edit):
entire_view = sublime.Region(0, self.view.size())
ascii_only = unicodedata.normalize('NFKD', self.view.substr(entire_view)).encode('ascii', 'ignore').decode('utf-8')
self.view.replace(edit, entire_view, ascii_only)
Save it in the folder ST recommends (which will be your Packages/User
folder), as something like remove_non_ascii_chars.py
(file extension is important, base name isn't)
- View menu -> Show Console
- Type/paste in
view.run_command('remove_non_ascii_chars')
and press Enter
- The diacritics will have been removed (the characters with an accent will have been converted to their non-accented equivalents).
Note: the above will actually also remove all non-ascii characters as well...
Further reading: