27

Just starting to explore the 'wonders' of regex. Being someone who learns from trial and error, I'm really struggling because my trials are throwing up a disproportionate amount of errors... My experiments are in PHP using ereg().

Anyway. I work with first and last names separately but for now using the same regex. So far I have:

^[A-Z][a-zA-Z]+$  

Any length string that starts with a capital and has only letters (capital or not) for the rest. But where I fall apart is dealing with the special situations that can pretty much occur anywhere.

  • Hyphenated Names (Worthington-Smythe)
  • Names with Apostophies (D'Angelo)
  • Names with Spaces (Van der Humpton) - capitals in the middle which may or may not be required is way beyond my interest at this stage.
  • Joint Names (Ben & Jerry)

Maybe there's some other way a name can be that I'm no thinking of, but I suspect if I can get my head around this, I can add to it. I'm pretty sure there will be instances where more than one of these situations comes up in one name.

So, I think the bottom line is to have my regex also accept a space, hyphens, ampersands and apostrophes - but not at the start or end of the name to be technically correct.

sth
  • 222,467
  • 53
  • 283
  • 367
Humpton
  • 1,469
  • 5
  • 17
  • 26
  • 2
    It IS possible to have hyphenated names with apostrophes, such as O'Brien-O'Malley. – DOK Nov 08 '08 at 21:00
  • 4
    I have no doubt that they might come up. Although, I'd beat my parents if they did that to me... – Humpton Nov 08 '08 at 21:06
  • I'd be inclined to beat my parents up for a regular hyphenated name. Having an unusual (reads: foreign) surname is bad enough. – Matthew Scharley Nov 09 '08 at 00:00
  • If you are going to require the first letter to be capital, might it be more friendly to allow the user the option to enter a string beginning with a lower case letter and then capitalize it using ucfirst()? – RMD Developer Jul 04 '11 at 08:44
  • possible duplicate of [Regular expression for validating names and surnames?](http://stackoverflow.com/questions/888838/regular-expression-for-validating-names-and-surnames) – outis Dec 15 '11 at 01:19
  • Bear in mind that some names don't start with a capital letter, e.g. "[de la Tour](https://en.wikipedia.org/wiki/Frances_de_la_Tour)". – Matt Gibson Dec 13 '14 at 15:11

27 Answers27

53

This regex is perfect for me.

^([ \u00c0-\u01ffa-zA-Z'\-])+$

It works fine in php environments using preg_match(), but doesn't work everywhere.

It matches Jérémie O'Co-nor so I think it matches all UTF-8 names.

Glenn
  • 12,741
  • 6
  • 47
  • 48
Daan
  • 1,879
  • 17
  • 18
45
  • Hyphenated Names (Worthington-Smythe)

Add a - into the second character class. The easiest way to do that is to add it at the start so that it can't possibly be interpreted as a range modifier (as in a-z).

^[A-Z][-a-zA-Z]+$
  • Names with Apostophies (D'Angelo)

A naive way of doing this would be as above, giving:

^[A-Z][-'a-zA-Z]+$

Don't forget you may need to escape it inside the string! A 'better' way, given your example might be:

^[A-Z]'?[-a-zA-Z]+$

Which will allow a possible single apostrophe in the second position.

  • Names with Spaces (Van der Humpton) - capitals in the middle which may or may not be required is way beyond my interest at this stage.

Here I'd be tempted to just do our naive way again:

^[A-Z]'?[- a-zA-Z]+$

A potentially better way might be:

^[A-Z]'?[- a-zA-Z]( [a-zA-Z])*$

Which looks for extra words at the end. This probably isn't a good idea if you're trying to match names in a body of extra text, but then again, the original wouldn't have done that well either.

  • Joint Names (Ben & Jerry)

At this point you're not looking at single names anymore?

Anyway, as you can see, regexes have a habit of growing very quickly...

Matthew Scharley
  • 127,823
  • 52
  • 194
  • 222
  • Thanks for the step by step - I will explore likewise! The Ben & Jerry thing comes from the nature of what I'm doing is couples often sign up as one user. I don't want what I do to bork the '&' if they do. I think I can see how to get it in, probably not in the last name check. – Humpton Nov 08 '08 at 20:52
  • This doesn't handle international names. One of the comments below pointed out the use of \p{L} but you can read a lot more about unicode character classes at http://www.regular-expressions.info/unicode.html – Kimball Robinson Aug 18 '10 at 17:44
  • 1
    Even though I never said this explicitly, this was never an attempt at a solution, rather it was an extended attempt at showing why a solution isn't really feasible. Names are just far too ambiguous to be able to validate with any reliability. Even something as simple as length restrictions can go afoul (I know someone with a single character first name). – Matthew Scharley Dec 19 '12 at 14:13
  • 1
    Ruby on Rails regular expressions [Security Guide](http://guides.rubyonrails.org/security.html#regular-expressions) now suggest you shouldn't use line start and end in validations (^ and $) for this is a security threat (possible to exploit javascript). Use [string start/end](http://stackoverflow.com/questions/577653/difference-between-a-z-and-in-ruby-regular-expressions) as \A and \z instead. – Andres Feb 15 '16 at 17:36
17

THE BEST REGEX EXPRESSIONS FOR NAMES:

  • I will use the term special character to refer to the following three characters:
    1. Dash -
    2. Hyphen '
    3. Dot .
  • Spaces and special characters can not appear twice in a row (e.g.: -- or '. or .. )
  • Trimmed (No spaces before or after)
  • You're welcome ;)

Mandatory single name, WITHOUT spaces, WITHOUT special characters:

^([A-Za-z])+$
  • Sierra is valid, Jack Alexander is invalid (has a space), O'Neil is invalid (has a special character)

Mandatory single name, WITHOUT spaces, WITH special characters:

^[A-Za-z]+(((\'|\-|\.)?([A-Za-z])+))?$
  • Sierra is valid, O'Neil is valid, Jack Alexander is invalid (has a space)

Mandatory single name, optional additional names, WITH spaces, WITH special characters:

^[A-Za-z]+((\s)?((\'|\-|\.)?([A-Za-z])+))*$
  • Jack Alexander is valid, Sierra O'Neil is valid

Mandatory single name, optional additional names, WITH spaces, WITHOUT special characters:

^[A-Za-z]+((\s)?([A-Za-z])+)*$
  • Jack Alexander is valid, Sierra O'Neil is invalid (has a special character)

SPECIAL CASE

Many modern smart devices add spaces at the end of each word, so in my applications I allow unlimited number of spaces before and after the string, then I trim it in the code behind. So I use the following:

Mandatory single name + optional additional names + spaces + special characters:

^(\s)*[A-Za-z]+((\s)?((\'|\-|\.)?([A-Za-z])+))*(\s)*$

Add your own special characters

If you wish to add your own special characters, let's say an underscore _ this is the group you need to update:

(\'|\-|\.)

To

(\'|\-|\.|\_)

PS: If you have questions comment here and I will receive an email and respond ;)

Taher Ahmed
  • 211
  • 2
  • 3
  • 1
    Thanks for this. Had a suggestion: the current regex does not match names like "Robert Downey Jr." which end in a special character - the regex needs ([A-Za-z])* instead of ([A-Za-z])+... so the final regex looks like ^(\s)*[A-Za-z]+((\s)?((\'|\-|\.)?([A-Za-z])*))*(\s)*$ – Jaspal Singh Jul 12 '17 at 15:15
  • The best for me. – Chimdi May 09 '20 at 18:18
  • how can I make atleast 3 characters? – shanmkha May 25 '23 at 19:15
6

While I agree with the answers saying you basically can't do this with regex, I will point out that some of the objections (internationalized characters) can be resolved by using UTF strings and the \p{L} character class (matches a unicode "letter").

eyelidlessness
  • 62,413
  • 11
  • 90
  • 94
  • You can read more about unicode and regular expressions at http://www.regular-expressions.info/unicode.html – Kimball Robinson Aug 18 '10 at 17:44
  • regular-expressions.info/unicode.html says "To match a letter including any diacritics, use \p{L}\p{M}*+." Doesn't work in python tho, I think. – dfrankow Jun 18 '18 at 15:14
5

security tip: make sure to validate the size of the string before this step to avoid DoS attack that will bring down your system by sending very long charsets.

Check this out:

^(([A-Za-z]+[,.]?[ ]?|[a-z]+['-]?)+)$

regex

You can test it here : https://regex101.com/r/mS9gD7/46

tk_
  • 16,415
  • 8
  • 80
  • 90
  • 1
    [a-z]+ -> This part of the regular expression may cause exponential backtracking on strings containing many repetitions of 'a'. It can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match. – MiKr13 Jul 17 '21 at 09:49
  • So what is the alternative ? Size of the string will be validated before this phase, so I believe that wont affect my code – tk_ Jul 19 '21 at 01:52
  • Hey there, I am just adding the comments on using +, maybe someday someone will add something to mitigate it. I am also trying to search for alternatives, quite frankly! – MiKr13 Jul 19 '21 at 17:07
  • 1
    appreciate the security tip. I modified my answer with that. – tk_ Jul 20 '21 at 02:26
4

I don't really have a whole lot to add to a regex that takes care of names because there are already some good suggestions here, but if you want a few resources for learning more about regular expressions, you should check out:

VirtuosiMedia
  • 52,016
  • 21
  • 93
  • 140
  • The PHPBuilder tutorial is very old, and applies only to the 'ereg' (or POSIX) flavor, which is deprecated and scheduled to be removed in PHP 6. The '[preg](http://www.php.net/manual/en/book.pcre.php)' (or PCRE) flavor is what you should use now. – Alan Moore Jan 16 '12 at 16:04
3

Basically, I agree with Paul... You will always find exceptions, like di Caprio, DeVil, or such.

Remarks on your message: in PHP, ereg is generally seen as obsolete (slow, incomplete) in favor of preg (PCRE regexes).
And you should try some regex tester, like the powerful Regex Coach: they are great to test quickly REs against arbitrary strings.

If you really need to solve your problem and aren't satisfied with above answers, just ask, I will give a go.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
PhiLho
  • 40,535
  • 6
  • 96
  • 134
  • Firstly, I'll add exploring preg to my list. Then, I'll investigate a tester. And, I totally accept that people like di Caprio will mess up my first musings... This does have a real use, but mostly it's a learning experience. What appeared here in minutes has given me a lot to go on. – Humpton Nov 08 '08 at 21:03
3

I second the 'give up' advice. Even if you consider numbers, hyphens, apostrophes and such, something like [a-zA-Z] still wouldn't catch international names (for example, those having šđčćž, or Cyrillic alphabet, or Chinese characters...)

But... why are you even trying to verify names? What errors are you trying to catch? Don't you think people know to write their name better than you? ;) Seriously, the only thing you can do by trying to verify names is to irritate people with unusual names.

Domchi
  • 10,705
  • 6
  • 54
  • 64
  • 1
    Amen to this. Please do not assume the first letter must be a capital letter. – fortboise May 02 '12 at 14:37
  • Though I agree it will be almost impossible to write a Regex which allows all "valid" names, checking for digits and special characters (such as parenthesis, double quotes and semicolons) will catch some invalid inputs (client-side) before sanitation (server-side) to prevent injection. – Leandri Aug 27 '14 at 11:53
  • @LeandriMidoriViviers - bad idea to fight injection with anything but prepared statements, it's simply not wise to limit the input. There are no invalid inputs, and your "illegal characters" perfectly illustrate my point - there are names with digits (Michael F. Johnson 2nd), nicknames are traditionally written with double quotes (Tom "Iceman" Kazanski), and I'd bet that some lunatic is thinking of using parenthesis and semicolons while naming his child as well. Think of all Little Bobby Tableses of the world. – Domchi Aug 28 '14 at 13:13
  • @Domchi I think you might have misunderstood. I did not mean checking for invalid input will prevent injection, rather that you should not only validate on client-side, but also sanitize input on server-side (using prepared statements, parameterized queries, escaping strings, filter_input functions and such to prevent injection). – Leandri Aug 28 '14 at 14:51
  • @Domchi You should also consider many contact forms are used to communicate with people in a business environment. Is it really necessary to allow users to contact you as Jimmy "BigSock" Jones? The context usually dictates which input values should be considered valid, such as the type of website or form, whether the website is local or international, and I you get the picture :) – Leandri Aug 28 '14 at 15:43
  • @LeandriMidoriViviers Agreed. On context as well, but I can't emphasize enough that you never know what is actually valid input in text fields. Phone numbers, names, addresses... just leave them alone people. I can't count the number of times some US-centric site insisted that I supply state where I have none in my address, or tried to prevent me from entering + in phone number, or similar nonsense. – Domchi Aug 29 '14 at 16:36
  • @Domchi `Michael F. Johnson 2nd` could be easily written `Michael F. Johnson II`. Have another example with digits? – reformed Aug 04 '17 at 14:30
  • @reformed Sure. There was an attempt in Sweden to name baby Brfxxccxxmnpcccclllmmnprxvclmnckssqlbb11116, there's American rapper André 3000, and American journalist Jennifer 8. Lee. Some US countries restrict names to not include digits, but some, like Kentucky, don't, so names with digits are completely possible. Also, please check this excellent article: http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ – Domchi Aug 05 '17 at 20:02
  • You can use p{L} or \pL in place of a-z to catch international characters – Chimdi May 09 '20 at 16:50
3

This worked for me:

 +[a-z]{2,3} +[a-z]*|[\w'-]*

This regex will correctly match names such as the following:

jean-claude van damme

nadine arroyo-rodriquez

wayne la pierre

beverly d'angelo

billy-bob thornton

tito puente

susan del rio

It will group "van damme", "arroyo-rodriquez" "d'angelo", "billy-bob", etc. as well as the singular names like "wayne".

Note that it does not test that the grouped stuff is actually a valid name. Like others said, you'll need a dictionary for that. Also, it will group numbers, so if that's an issue you may want to modify the regex.

I wrote this to parse names for a MapReduce application. All I wanted was to extract words from the name field, grouping together the del foo and la bar and billy-bobs into one word to make the key-value pair generation more accurate.

uke
  • 39
  • 3
1

To improve on daan's answer:

^([\u00c0-\u01ffa-zA-Z]+\b['\-]{0,1})+\b$

only allows a single occurances of hyphen or apostrophy within a-z and valid unicode chars.

also does a backtrack to make sure there is no hyphen or apostrophes at the end of the string.

majestic
  • 19
  • 1
1
^[A-Z][a-zA-Z '&-]*[A-Za-z]$ 

Will accept anything that starts with an uppercase letter, followed by zero or more of any letter, space, hyphen, ampersand or apostrophes, and ending with a letter.

Robert Gamble
  • 106,424
  • 25
  • 145
  • 137
1

See this question for more related "name-detection" related stuff.

regex to match a maximum of 4 spaces

Basically, you have a problem in that, there are effectively no characters in existence that can't form a legal name string.

If you are still limiting yourself to words without ä ü æ ß and other similar non-strictly-ascii characters.

Get yourself a copy of UTF32 character table and realise how many millions of valid characters there are that your simple regex would miss.

Community
  • 1
  • 1
Kent Fredric
  • 56,416
  • 14
  • 107
  • 150
1

To add multiple dots in the username use this Regex:

^[a-zA-Z][a-zA-Z0-9_]*\.?[a-zA-Z0-9_\.]*$

String length can be set separately.

sth
  • 222,467
  • 53
  • 283
  • 367
1

You can easily neutralize the whole matter of whether letters are upper or lowercase -- even in unexpected or uncommon locations -- by converting the string to all upper case using strtoupper() and then checking it against your regex.

Pat Kelly
  • 11
  • 3
1

^[A-Z][a-z]*(([,.] |[ '-])[A-Za-z][a-z]*)*(\.?)( [IVXLCDM]+)?$

For complete details, please visit THIS post. This regex doesn't allow ampersands.

Aman Godara
  • 384
  • 1
  • 4
  • 6
1

/([\u00c0-\u01ffa-zA-Z'\-]+[ ]?[*]?[\u00c0-\u01ffa-zA-Z'\-]*)+/;

Try this . You can also force to start with char using ^,and end with char using $

McDowell
  • 107,573
  • 31
  • 204
  • 267
Tatarasanu Victor
  • 656
  • 1
  • 7
  • 19
0

I have used this, because name can be the part of file-patch.

//http://support.microsoft.com/kb/177506
foreach(array('/','\\',':','*','?','<','>','|') as $char)
  if(strpos($name,$char)!==false)
      die("Not allowed char: '$char'");
0

I ran into this same issue, and like many others that have posted, this isn't a 100% fool proof expression, but it's working for us.

/([\-'a-z]+\s?){2,4}/

This will check for any hyphens and/or apostrophes in either the first and/or last name as well as checking for a space between the first and last names. The last part is a little magic that will check for between 2 and 4 names. If you tend to have a lot of international users that may have 5 or even 6 names, you can change that to 5 or 6 and it should work for you.

paviktherin
  • 121
  • 2
  • 9
0

i think "/^[a-zA-Z']+$/" is not enough it will allow to pass single letter we can adjust the range by adding {4,20} which means the range of letters are 4 to 20.

0

if you add spaces then "He went to the market on Sunday" would be a valid name.

I don't think you can do this with a regex, you cannot easily detect names from a chunk of text using a regex, you would need a dictionary of approved names and search based on that. Any names not on the list wouldn't be detected.

Osama Al-Maadeed
  • 5,654
  • 5
  • 28
  • 48
  • Oh man, where's the name change form - I'm totally changing my name to "H went to the market on Sunday". – Paul Tomblin Nov 08 '08 at 20:46
  • You can't pull names out of a body of text, but you could potentially do a match to see if a given string is a 'valid' name. Why you would bother in production is beyond me, but this isn't production, this is learning regex. – Matthew Scharley Nov 08 '08 at 20:49
  • Right, my attempt is not to find a name in a sentence or paragraph or whatever, but check for some semblance of normality. – Humpton Nov 08 '08 at 20:56
0

I've come up with this RegEx pattern for names:

/^([a-zA-Z]+[\s'.]?)+\S$/

It works. I think you should use it too.

It matches only names or strings like:

Dr. Shaquil O'Neil Armstrong Buzz-Aldrin

It won't match strings with 2 or more spaces like:

John  Paul

It won't match strings with ending spaces like:

John Paul 

The text above has an ending space. Try highlighting or selecting the text to see the space

Here's what I use to learn and create your own regex patterns:

RegExr: Leanr, Build and Test RegEx

doncadavona
  • 7,162
  • 9
  • 41
  • 54
0

Have a nice day !

0

you can use this below for names

^[a-zA-Z'-]{3,}\s[a-zA-Z'-]{3,}$

^ start of the string

$ end of the string

\s space

[a-zA-Z'-\s]{3,} will accept any name with a length of 3 characters or more, and it include names with ' or - like jean-luc

So in our case it will only accept names in 2 parts separated by a space


in case of multiple first-name you can add a \s

^[a-zA-Z'-\s]{3,}\s[a-zA-Z'-]{3,}$
Aominé
  • 472
  • 3
  • 11
0

Following Regex is simple and useful for proper names (Towns, Cities, First Name, Last Name) allowing all international letters omitting unicode-based regex engine.

It is flexible - you can add/remove characters you want in the expression (focusing on characters you want to reject rather than include).

^(?:(?!^\s|[ \-']{2}|[\d\r\n\t\f\v!"#$%&()*+,\.\/:;<=>?@[\\\]^_`{|}~€‚ƒ„…†‡ˆ‰‹‘’“”•–—˜™›¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½¾¿×÷№′″ⁿ⁺⁰‱₁₂₃₄]|\s$).){1,50}$

Regex matches: from 1 to 50 international letters separated by single delimiter (space -')

Regex rejects: empty prefix/suffix, consecutive delimiters (space - '), digits, new line, tab, limited list of extended ASCII characters

Demo

0

This is what I use for full name:

$pattern = "/^((\p{Lu}{1})\S(\p{Ll}{1,20})[^0-9])+[-'\s]((\p{Lu}{1})\S(\p{Ll}{1,20}))*[^0-9]$/u";
  • Supports all languages
  • Common names("Jane Doe", "John Doe")
  • Usefull for composed names("Marie-Josée Côté-Rochon", "Bill O'reilly")
  • Excludes digits(0-9)
  • Only excepts uppercase at beginning of names
  • First and last names from 2-21 characters
  • Adding trim() to remove whitespace
  • Does not except("John J. William", "Francis O'reilly Jr. III")
  • Must use full names, not: ("John", "Jane", "O'reilly", "Smith")

Edit: It seems that both [^0-9] in the pattern above was matching at least a fourth digit/letter in each of either first and/or last names.

Therefore names of three letters/digits could not be matched.

Here is the edited regular expression:

$pattern = "/^(\p{Lu}{1}\S\p{Ll}{1,20}[-'\s]\p{Lu}{1}\S\p{Ll}{1,20})+([^\d]+)$/u";
kkyucon
  • 46
  • 4
-1

Give up. Every rule you can think of has exceptions in some culture or other. Even if that "culture" is geeks who like legally change their names to "37eet".

Paul Tomblin
  • 179,021
  • 58
  • 319
  • 408
-1

Try this regex:

^[a-zA-Z'-\s\.]{3,20}\s[a-zA-Z'-\.]{3,20}$

Aomine's answer was quite helpful, I tweaked it a bit to include:

  1. Names with dots (middle): Jane J. Samuels

  2. Names with dots at the end: John Simms Snr.

Also the name will accept minimum 2 letters, and a min. of 2 letters for surname but no more than 20 for each (so total of 40 characters)

Successful Test cases:

D'amalia Jones    
David Silva Jnr.    
Jay-Silva Thompson
Shay .J. Muhanned
Bob J. Iverson
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57