24

I need a function or a regular expression to validate strings which contain alpha characters (including French ones), minus sign (-), dot (.) and space (excluding everything else)

Thanks

Yacoby
  • 54,544
  • 15
  • 116
  • 120
nextu
  • 379
  • 1
  • 3
  • 8

9 Answers9

37
/^[a-zàâçéèêëîïôûùüÿñæœ .-]*$/i

Use of /i for case-insensitivity to make things simpler. If you don't want to allow empty strings, change * to +.

Sebastian Brosch
  • 42,106
  • 15
  • 72
  • 87
Amber
  • 507,862
  • 82
  • 626
  • 550
  • 4
    Strictly speaking ñ is not french, but the OP did not explicitely exclude non-french characters. – mouviciel Dec 17 '09 at 14:36
  • French doesn't use some of the characters the regular expression is trying to match, and the regular expression is not generic enough if it wants to match all those characters that are used in languages like French. – apaderno Dec 17 '09 at 14:39
  • 2
    @kiamluno, except for ñ french uses all other mentioned characters, some in only few words, like "où" (where) or "L'Haÿ-les-Roses" (a city near Paris) – mouviciel Dec 17 '09 at 15:08
  • 1
    @Amber, I think the `u` pattern modifier is appropriate here, n'est-ce pas? – mickmackusa Feb 07 '20 at 08:34
  • The problem with this approach is with newspaper headlines like « Björn Borg roi de Paris!» – PatrickT Jun 08 '20 at 15:47
  • For anyone looking to include uppercase as well, here's all those naughty characters in uppercase too: [a-zàâçéèêëîïôûùüÿñæœA-ZÀÂÇÉÈÊËÎÏÔÛÙÜŸÑÆŒ .-] (tip: paste them into Microsoft Word, select them, then use Shift-F3) – user3012629 May 06 '21 at 11:07
28

Simplified solution:

/^[a-zA-ZÀ-ÿ-. ]*$/

Explanation:

^ Start of the string [ ... ]* Zero or more of the following: a-z lowercase alphabets A-Z Uppercase alphabets À-ÿ Accepts lowercase and uppercase characters including letters with an umlaut - dashes . periods spaces $ End of the string

Sam G
  • 1,242
  • 15
  • 12
  • 9
    I think `À-ÿ` is wrong. Shouldn't it be `À-Ÿ` ([example](https://pythex.org/?regex=%5B%C3%80-%C5%B8%5D%2B&test_string=%C3%80%C3%82%C3%87%C3%89%C3%88%C3%8A%C3%8B%C3%8E%C3%8F%C3%94%C3%9B%C3%99%C3%9C%C5%B8%C3%91%C3%86%C5%92%C3%A0%C3%A2%C3%A7%C3%A9%C3%A8%C3%AA%C3%AB%C3%AE%C3%AF%C3%B4%C3%BB%C3%B9%C3%BC%C3%BF%C3%B1%C3%A6%C5%93&ignorecase=0&multiline=0&dotall=0&verbose=0))? – Stefan Falk Aug 29 '19 at 09:53
  • Yes, you're right. – Boommeister Feb 18 '22 at 09:02
8

Try:

/^[\p{L}-. ]*$/u

This says:

^         Start of the string
[ ... ]*  Zero or more of the following:
  \p{L}     Unicode letter characters
  -         dashes
  .         periods
            spaces
$         End of the string
/u        Enable Unicode mode in PHP
John Feminella
  • 303,634
  • 46
  • 339
  • 357
3

The character class I've been using is the following:

[\wÀ-Üà-øoù-ÿŒœ]. This covers a slightly larger character set than only French, but excludes a large portion of Eastern European and Scandinavian diacriticals and letters that are not relevant to French. I find this a decent compromise between brevity and exclusivity.

To match/validate complete sentences, I use this expression: [\w\s.,!?:;&#%’'"()«»À-Üà-øoù-ÿŒœ], which includes punctuation and French style quotation marks.

Tom Auger
  • 19,421
  • 22
  • 81
  • 104
1

Simply use the following code :

     /[\u00C0-\u017F]/
lookly Dev
  • 325
  • 3
  • 5
0

This line of regex pass throug all of cirano de bergerac french text: (you will need to remove markup language characters http://www.gutenberg.org/files/1256/1256-8.txt

^([0-9A-Za-z\u00C0-\u017F\ ,.\;'\-()\s\:\!\?\"])+
PyWebDesign
  • 187
  • 1
  • 11
0

All French and Spanish accents /^[a-zA-ZàâäæáãåāèéêëęėēîïīįíìôōøõóòöœùûüūúÿçćčńñÀÂÄÆÁÃÅĀÈÉÊËĘĖĒÎÏĪĮÍÌÔŌØÕÓÒÖŒÙÛÜŪÚŸÇĆČŃÑ .-]*$/

-2

This might suit:

/^[ a-zA-Z\xBF-\xFF\.-]+$/

It lets a few extra chars in, like ÷, but it handles quite a few of the accented characters.

nickf
  • 537,072
  • 198
  • 649
  • 721
-3

/[A-Za-z-\.\s]/u should work.. /u switch is for UTF-8 encoding

Pragati Sureka
  • 1,412
  • 12
  • 18