4

I would like to create a regex which validates a name of a person. These should be allowed:

  • Letters (uppercase and lowercase)
  • -
  • spaces

This is pretty easy to create a regex for. The problem is that some people also use special characters in their names. For example, assume a user named gûnther or François. There are a lot of characters like û and ç available and it's hard to list all of these.

Is there an easy way to check for correct human names?

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
Bv202
  • 3,924
  • 13
  • 46
  • 80
  • What about chinese or cyrillic names? What character set are you operating with? – Pekka Feb 24 '11 at 13:26
  • Names can also contain quotes, like in "O'Neill", for example. – Capsule Feb 24 '11 at 13:27
  • Chinese or Cyrillic names are not allowed... there should be a limit. I'm not sure which character set is best to use... – Bv202 Feb 24 '11 at 13:29
  • You can use the solution in this question: http://stackoverflow.com/questions/888838/regular-expression-for-validating-names-and-surnames – SERPRO Feb 24 '11 at 13:29
  • Defining a character list for all human names may seem out of the scope of regex. From a non-programming POV provide an incentive to give a real name? Or just exclude characters you don't want. – Russell Dias Feb 24 '11 at 13:31

6 Answers6

7

Is there an easy way to check for correct human names?

This has been discussed several times. I'm fairly certain that the only thing that people can agree on is that in order to exist a name cannot be a empty string, thus:

^.+$

(Yes, I am aware that this is probably not what OP is looking for. I'm just summarizing earlier Q&As.)

jensgram
  • 31,109
  • 6
  • 81
  • 98
  • This is very poor/loose validation. It would permit `42`, `^&*`, and `><`. This makes no attempt to validate based on the OP's requirements. This will effectively defend against a zero-width string and a string with newline characters, though. – mickmackusa Nov 07 '19 at 11:22
7

/^\pL[\pL '-]*\z/ should do the trick

Long Ears
  • 4,886
  • 1
  • 21
  • 16
  • 1
    @Bv202, it requires the string to consist of one or more of: `\pL` (a unicode letter), spaces, apostrophes and hyphens. Starting with a unicode letter. – Long Ears Feb 24 '11 at 13:49
  • Should be the accepted answer, rather than the somewhat smart-aleck one above. (Though his point should be considered.) – Parapluie Mar 08 '17 at 20:46
  • @Parapluie IMO no code-only answer should ever be accepted on the Stack Exchange Network -- that would incentivize poor posting practices. – mickmackusa Nov 07 '19 at 11:15
1
^.+$

Checked @jensgram answer, but that regex only accepts all strings, so it doesn't solve problem, because string needs to be name, in this case it can be anything.

^[A-Z][a-z]+$

My regex only accepts string where first char is uppercase and following chars are letters in lowercase. Also looking through other answers, this seems to be shortest regex and also simpliest.

Imants Volkovs
  • 838
  • 11
  • 20
1

The short answer is no, there is no easy way. You have touched on the biggest issue. There are so many special cases of accents and extra things hanging of letters that it will become a mess to deal with. Additionally, the expression with break down to something like this

^[CAPITAL_LETERS][ALL_LETERS_AND_SYMBOLS]*$

That is not that helpful because "Abcd" fits that and you have no way to know if someone is incorrectly entering info into the field or if it was a crazy Hollywood parent that actually named their kid that or something like Sandwich or Umbrella.

unholysampler
  • 17,141
  • 7
  • 47
  • 64
0

I had the same problem. First I came up with something like

    preg_match("/^[a-zA-Z]{1,}([\s-]*[a-zA-Z\s\'-]*)$/", $name))

but then realized that UTF-8 chars of countries like Sweden, China etc. for example Õ å would not be allowed which was important to my site since it's an international site and don't want to force users not being able to enter their real name.

I though it might be an easier solution instead of trying to figure out how to allow names like O'Malley and Brooks-Schneider and Õsmar (made that one up :) to rather catch chars that you don't want them to enter. For me it was basically to avoid xss JS code being entered. So I use the following regex to filter out all chars that might be harmful.

    preg_match("/[~!@#\$%\^&\*\(\)=\+\|\[\]\{\};\\:\",\.\<\>\?\/]+/", $name)

That way they can enter any name they want except chars that really aren't part of any name. Hope this might be useful.

  • 1
    The pattern modifier `i` will spare needing to write both uppercase and lowercase letters in your character classes. `{1,}` is simply written as `+`. Escaping the single quote is unnecessary. The cature group is unnecessary. – mickmackusa Nov 07 '19 at 11:27
0

I don't know exactly what you are trying to do (validate user name input?) but basically I would keep it simple - fail the validation if the text contains numbers. And even that's probably pretty shaky.

Richard H
  • 38,037
  • 37
  • 111
  • 138