Sublime text replace multiple accented characters with unaccented ones at once

Question

I need to replace all characters with an accent in a text file, that is:

á é í ó ú ñ

for their non-accent equivalents:

a e i o u n

Can this be achieved via some regex command for the entire file at once?

Update (Feb 1st, 2017)

I took the great answer by Keith Hall and turned into a Sublime package. You can find it here: RemoveNonAsciiChars.

Not sure about the regex, but, personally, I would ensure Unicode support for whatever the file is for, instead. — Sebastian Simon, Aug 12 '16 at 02:53
That's not up to me sadly. There's no unicode support, thus why I need to replace the characters with accents. — Gabriel, Aug 12 '16 at 02:57

score 21 · Accepted Answer · edited May 23 '17 at 12:10

You can use a regex like:

(?=\p{L})[^a-zA-Z]

to find the characters with diacritics.

(?=\p{L}) positive lookahead to ensure the next character is a Unicode letter
[^a-zA-Z] negative character class to exclude letters without diacritics.

This is necessary because Sublime Text (or, more specifically, the Boost regex engine it uses for Find and Replace) doesn't support \p{M}. See http://www.regular-expressions.info/unicode.html for more information on what the \p meta character does.

For replacing, unfortunately you will need to specify the characters to replace manually. To make it harder, ST doesn't seem to support the POSIX character equivalents, nor does it support conditionals in the replacement, which would allow you to do the find and replace in one pass, using capture groups.

Therefore, you would need to use multiple find expressions like:

[ÀÁÂÃÄÅ]

replace with

and

[àáâãäå]

replace with

etc.

which is a lot of manual work.

A much easier/quicker/less-manual-work approach would be to use the Python API instead of regex:

Tools menu -> Developer -> New Plugin

Paste in the following:

import sublime
import sublime_plugin
import unicodedata

class RemoveNonAsciiCharsCommand(sublime_plugin.TextCommand):
    def run(self, edit):
        entire_view = sublime.Region(0, self.view.size())
        ascii_only = unicodedata.normalize('NFKD', self.view.substr(entire_view)).encode('ascii', 'ignore').decode('utf-8')
        self.view.replace(edit, entire_view, ascii_only)

Save it in the folder ST recommends (which will be your Packages/User folder), as something like remove_non_ascii_chars.py (file extension is important, base name isn't)
View menu -> Show Console
Type/paste in view.run_command('remove_non_ascii_chars') and press Enter
The diacritics will have been removed (the characters with an accent will have been converted to their non-accented equivalents).

Note: the above will actually also remove all non-ascii characters as well...

Sublime text replace multiple accented characters with unaccented ones at once

1 Answers1