10

I've always struggled with RegEx so forgive me if this may seem like an awful approach at tackling my problem.

When users are entering first and last names I started off just using the basic, check for upper and lower case, white space, apostrophes and hyphens

if (!preg_match("/^[a-zA-Z\s'-]+$/", $name)) { // Error }

Now I realise this isn't the best since people could have things such as: Dr. Martin Luther King, Jr. (with comma's and fullstops). So I assume by changing it to this would make it slightly more effective.

if (!preg_match("/^[a-zA-Z\s,.'-]+$/", $name)) { // Error }

I then saw a girls name I know on my Facebook who writes her name as Siân, which got me thinking of names which contain umlauts as well as say Japanese/Chinese/Korean/Russian characters too. So I started searching and found ways by writing each of these characters in there like so.

if (!preg_match("/^[a-zA-Z\sàáâäãåèéêëìíîïòóôöõøùúûüÿýñçčšžÀÁÂÄÃÅÈÉÊËÌÍÎÏÒÓÔÖÕØÙÚÛÜŸÝÑßÇŒÆČŠŽ∂ð ,.'-]+$/u", $first_name)) { // Error }

As you can imagine, it's extremely long winded and I'm pretty certain there is a much simpler RegEx which can achieve this. Like I've said, I've searched around but this is the best I can do.

So, what is a good way to check for upper and lower case characters, commas, full stops, apostrophes, hypens, umlauts, Latin, Japanese/Russian etc

no.
  • 2,356
  • 3
  • 27
  • 42

3 Answers3

32

You can use an Unicode character class. \pL covers pretty much all letter symbols.
http://php.net/manual/en/regexp.reference.unicode.php

 if (!preg_match("/^[a-zA-Z\s,.'-\pL]+$/u", $name))

See also http://www.regular-expressions.info/unicode.html, but beware that PHP/PCRE only understands the abbreviated class names.

mario
  • 144,265
  • 20
  • 237
  • 291
  • Ah thank you very much, why couldn't I find this earlier, ha! Could you tell me what you mean by abbreviated class names? – no. Nov 04 '11 at 18:23
  • @HelloJoe: It's not the most obvious feature. Only found the documentation in the PHP manual pretty late. Abbreviations: PCRE only supports `\p{L}` not `\p{Letter}` or `\p{Russian}` for example. – mario Nov 04 '11 at 19:36
  • You have `,.'` in there, you might want to remove it as it is a name. – matrixdevuk Dec 08 '13 at 11:01
  • 2
    perfect, thank you. but expression contain a small error, correct one: `/^[a-zA-Z\s,.'\-\pL]+$/u` or `/^[a-z\s,.'-\pL]+$/iu` – mrDinkelman Apr 01 '16 at 12:52
  • it doesn't cover æøå or ß – TheCrazyProfessor Nov 22 '18 at 09:22
7

\pL already includes a-z and A-Z, therefore the mentioned pattern "/^[a-zA-Z\s,.'-\pL]+$/u" could be simplified to

"/^[\s,.'-\pL]+$/"

also the modifier u is not required.

staabm
  • 1,535
  • 22
  • 20
  • 4
    Though I initially intended to +1 due to the mention of `a-zA-Z` being redundant, I must mention that the `u` modifier is certainly required as otherwise PHP does not support multi-byte encodings. – dotancohen Dec 08 '13 at 05:21
  • I tested it on my DEV machine and it worked for me event without the `u` modifier – staabm Mar 12 '14 at 14:58
  • Were you using a UTF-8 or other multibyte encoding, or a single-byte encoding such as ASCII or latin1? The `u` modifier is not necessary for single-byte encodings. – dotancohen Mar 12 '14 at 15:12
3

There could probably be some loosening of the qualifications by allowing other types of punctuation.

One thing that should be a restriction is requiring at least one letter.

if (!preg_match("/^[\s,.'-]*\p{L}[\p{L}\s,.'-]*$/u", $name))