I need a function or a regular expression to validate strings which contain alpha characters (including French ones), minus sign (-), dot (.) and space (excluding everything else)
Thanks
/^[a-zàâçéèêëîïôûùüÿñæœ .-]*$/i
Use of /i
for case-insensitivity to make things simpler. If you don't want to allow empty strings, change *
to +
.
Simplified solution:
/^[a-zA-ZÀ-ÿ-. ]*$/
Explanation:
^ Start of the string
[ ... ]* Zero or more of the following:
a-z lowercase alphabets
A-Z Uppercase alphabets
À-ÿ Accepts lowercase and uppercase characters including letters with an umlaut
- dashes
. periods
spaces
$ End of the string
Try:
/^[\p{L}-. ]*$/u
This says:
^ Start of the string
[ ... ]* Zero or more of the following:
\p{L} Unicode letter characters
- dashes
. periods
spaces
$ End of the string
/u Enable Unicode mode in PHP
The character class I've been using is the following:
[\wÀ-Üà-øoù-ÿŒœ]
. This covers a slightly larger character set than only French, but excludes a large portion of Eastern European and Scandinavian diacriticals and letters that are not relevant to French. I find this a decent compromise between brevity and exclusivity.
To match/validate complete sentences, I use this expression:
[\w\s.,!?:;&#%’'"()«»À-Üà-øoù-ÿŒœ]
, which includes punctuation and French style quotation marks.
This line of regex pass throug all of cirano de bergerac french text: (you will need to remove markup language characters http://www.gutenberg.org/files/1256/1256-8.txt
^([0-9A-Za-z\u00C0-\u017F\ ,.\;'\-()\s\:\!\?\"])+
All French and Spanish accents
/^[a-zA-ZàâäæáãåāèéêëęėēîïīįíìôōøõóòöœùûüūúÿçćčńñÀÂÄÆÁÃÅĀÈÉÊËĘĖĒÎÏĪĮÍÌÔŌØÕÓÒÖŒÙÛÜŪÚŸÇĆČŃÑ .-]*$/
This might suit:
/^[ a-zA-Z\xBF-\xFF\.-]+$/
It lets a few extra chars in, like ÷, but it handles quite a few of the accented characters.