Regex Python french accent

Question

I use this code: b = re.sub('[^A-Za-z]+', ' ', a). Nevertheless i need to take account of the french accents: àâéèêëïîôùûç. Can you please help me? :)

Thanks.

Check this question https://stackoverflow.com/questions/1922097/regular-expression-for-french-characters — Fermín Rodríguez del Castillo, Jun 01 '20 at 04:35

score 0 · Answer 1 · answered Jun 01 '20 at 04:38

0

If you're like to replace all the letters, taking into account unicode, do the following:

text = "àâéèêëïîôùûç"
re.sub('\w+', ' ', text, re.UNICODE)

Please note that the re.UNICODE is not needed in python3, as it does unicode matching by default.

answered Jun 01 '20 at 04:38

Roy2012

11,755
2
22
35

1

FYI: always use `flags=`, `count=` etc explicitly as the positions change between different functions. For ex: `sub(pattern, repl, string, count=0, flags=0)` and `findall(pattern, string, flags=0)` ... so, your code is actually doing `count=re.UNICODE` – Sundeep Jun 25 '20 at 09:21

score 0 · Accepted Answer · answered Jun 01 '20 at 09:09

0

Regex for accented characters has been covered before really well over here.

If you're dealing with French accents (not umlauts etc) then you're code could be updated like this:

b = re.sub('[^A-zÀ-ú]+', ' ', a)

That should amend your previous "all upper and lower case letters" to "all upper and lower case letters including accents"

answered Jun 01 '20 at 09:09

houseofleft

347
1
12

4

`À-ú` matches much more than french accented character. And doesn't mach lowercase. `A-z` matches more tan just letters. Have a look at an [ASCII table](http://www.asciitable.com/) – Toto Jun 01 '20 at 09:16

Regex Python french accent

2 Answers2