regular expression for French characters

Question

I need a function or a regular expression to validate strings which contain alpha characters (including French ones), minus sign (-), dot (.) and space (excluding everything else)

Thanks

score 37 · Accepted Answer · edited Jul 04 '17 at 14:22

37

/^[a-zàâçéèêëîïôûùüÿñæœ .-]*$/i

Use of /i for case-insensitivity to make things simpler. If you don't want to allow empty strings, change * to +.

edited Jul 04 '17 at 14:22

Sebastian Brosch

42,106
15
72
87

answered Dec 17 '09 at 14:26

Amber

507,862
82
626
550

4

Strictly speaking ñ is not french, but the OP did not explicitely exclude non-french characters. – mouviciel Dec 17 '09 at 14:36
French doesn't use some of the characters the regular expression is trying to match, and the regular expression is not generic enough if it wants to match all those characters that are used in languages like French. – apaderno Dec 17 '09 at 14:39
2

@kiamluno, except for ñ french uses all other mentioned characters, some in only few words, like "où" (where) or "L'Haÿ-les-Roses" (a city near Paris) – mouviciel Dec 17 '09 at 15:08
1

@Amber, I think the `u` pattern modifier is appropriate here, n'est-ce pas? – mickmackusa Feb 07 '20 at 08:34
The problem with this approach is with newspaper headlines like « Björn Borg roi de Paris!» – PatrickT Jun 08 '20 at 15:47
For anyone looking to include uppercase as well, here's all those naughty characters in uppercase too: [a-zàâçéèêëîïôûùüÿñæœA-ZÀÂÇÉÈÊËÎÏÔÛÙÜŸÑÆŒ .-] (tip: paste them into Microsoft Word, select them, then use Shift-F3) – user3012629 May 06 '21 at 11:07

score 28 · Answer 2 · answered Jun 19 '17 at 10:49

28

Simplified solution:

/^[a-zA-ZÀ-ÿ-. ]*$/

Explanation:

^ Start of the string [ ... ]* Zero or more of the following: a-z lowercase alphabets A-Z Uppercase alphabets À-ÿ Accepts lowercase and uppercase characters including letters with an umlaut - dashes . periods spaces $ End of the string

answered Jun 19 '17 at 10:49

Sam G

1,242
15
12

9

I think `À-ÿ` is wrong. Shouldn't it be `À-Ÿ` ([example](https://pythex.org/?regex=%5B%C3%80-%C5%B8%5D%2B&test_string=%C3%80%C3%82%C3%87%C3%89%C3%88%C3%8A%C3%8B%C3%8E%C3%8F%C3%94%C3%9B%C3%99%C3%9C%C5%B8%C3%91%C3%86%C5%92%C3%A0%C3%A2%C3%A7%C3%A9%C3%A8%C3%AA%C3%AB%C3%AE%C3%AF%C3%B4%C3%BB%C3%B9%C3%BC%C3%BF%C3%B1%C3%A6%C5%93&ignorecase=0&multiline=0&dotall=0&verbose=0))? – Stefan Falk Aug 29 '19 at 09:53
Yes, you're right. – Boommeister Feb 18 '22 at 09:02

score 8 · Answer 3 · answered Dec 17 '09 at 14:29

Try:

/^[\p{L}-. ]*$/u

This says:

^         Start of the string
[ ... ]*  Zero or more of the following:
  \p{L}     Unicode letter characters
  -         dashes
  .         periods
            spaces
$         End of the string
/u        Enable Unicode mode in PHP

score 3 · Answer 4 · answered Feb 22 '19 at 14:39

The character class I've been using is the following:

[\wÀ-Üà-øoù-ÿŒœ]. This covers a slightly larger character set than only French, but excludes a large portion of Eastern European and Scandinavian diacriticals and letters that are not relevant to French. I find this a decent compromise between brevity and exclusivity.

To match/validate complete sentences, I use this expression: [\w\s.,!?:;&#%’'"()«»À-Üà-øoù-ÿŒœ], which includes punctuation and French style quotation marks.

score 1 · Answer 5 · answered Jul 13 '20 at 19:54

1

Simply use the following code :

     /[\u00C0-\u017F]/

answered Jul 13 '20 at 19:54

lookly Dev

325
3
5

score 0 · Answer 6 · answered Aug 16 '14 at 00:16

0

This line of regex pass throug all of cirano de bergerac french text: (you will need to remove markup language characters http://www.gutenberg.org/files/1256/1256-8.txt

^([0-9A-Za-z\u00C0-\u017F\ ,.\;'\-()\s\:\!\?\"])+

answered Aug 16 '14 at 00:16

PyWebDesign

187
1
11

You're escaping a bunch of stuff that don't need to be escaped within a character class – Tom Auger Feb 22 '19 at 14:40

score 0 · Answer 7 · answered Apr 26 '22 at 12:06

0

All French and Spanish accents /^[a-zA-ZàâäæáãåāèéêëęėēîïīįíìôōøõóòöœùûüūúÿçćčńñÀÂÄÆÁÃÅĀÈÉÊËĘĖĒÎÏĪĮÍÌÔŌØÕÓÒÖŒÙÛÜŪÚŸÇĆČŃÑ .-]*$/

answered Apr 26 '22 at 12:06

EricCartman

1

score -2 · Answer 8 · answered Dec 17 '09 at 14:25

-2

This might suit:

/^[ a-zA-Z\xBF-\xFF\.-]+$/

It lets a few extra chars in, like ÷, but it handles quite a few of the accented characters.

answered Dec 17 '09 at 14:25

nickf

537,072
198
649
721

score -3 · Answer 9 · answered Dec 17 '09 at 14:42

-3

/[A-Za-z-\.\s]/u should work.. /u switch is for UTF-8 encoding

answered Dec 17 '09 at 14:42

Pragati Sureka

1,412
12
18

regular expression for French characters

9 Answers9

Linked

Related