I use this code: b = re.sub('[^A-Za-z]+', ' ', a)
. Nevertheless i need to take account of the french accents: àâéèêëïîôùûç
. Can you please help me? :)
Thanks.
I use this code: b = re.sub('[^A-Za-z]+', ' ', a)
. Nevertheless i need to take account of the french accents: àâéèêëïîôùûç
. Can you please help me? :)
Thanks.
If you're like to replace all the letters, taking into account unicode, do the following:
text = "àâéèêëïîôùûç"
re.sub('\w+', ' ', text, re.UNICODE)
Please note that the re.UNICODE
is not needed in python3, as it does unicode matching by default.
Regex for accented characters has been covered before really well over here.
If you're dealing with French accents (not umlauts etc) then you're code could be updated like this:
b = re.sub('[^A-zÀ-ú]+', ' ', a)
That should amend your previous "all upper and lower case letters" to "all upper and lower case letters including accents"