I do the following:
re.sub(r'[^ \nA-Za-z0-9/]+', '', document)
to remove every character which is not alphanumeric, space, newline, or forward slash.
So I basically I want to remove all special characters except for the newline and the forward slash.
However, I do not want to remove the accented letters which various languages have such as in French, German etc.
But if I run the code above then for example the word
Motörhead
becomes
Motrhead
and I do not want to do this.
So how do I run the code above but without removing the accented letters?
UPDATE:
@MattM below has suggested a solution which does work for languages such as English, French, German etc but it certainly does not work for languages such as Polish where all the accented letters were still removed.