11

I need to make sure people enter their first, middle and last names correctly for a form in Rails. So the first thought for a regular expression is:

\A[[:upper:]][[:alpha:]'-]+( [[:upper:]][[:alpha:]'-]*)*\z

That'll make sure every word in the name starts with an uppercase letter followed by a letter or hyphen or apostrophe.

My first question I guess doesn't have much to do with regular expressions, though I'm hoping there's a regular expression I can copy for this. Are letters, hyphens and apostrophes the only characters I should be checking in a name?

My second question is if it's important to make sure each name has at least 1 uppercase letter? So many people enter all lowercase names and I really want to avoid that, but is it sometimes legitimate?

Here's what I have so far that makes sure there's at least 1 uppercase letter somewhere in the name:

\A([[:alpha:]'-]+ )*[[:alpha:]'-]*[[:upper:]][[:alpha:]'-]*( [[:alpha:]'-]+)*\z

Isn't there a [:name:] bracket expression? :)

UPDATE: I added . and , to the characters allowed, surprised I didn't think of them originally. So many people must have to deal with this kind of regular expression! Nobody has any pre-made regular expressions for this sort of thing?

at.
  • 50,922
  • 104
  • 292
  • 461
  • 14
    Hint: A name can contain *anything* – Jerry Mar 26 '14 at 09:58
  • Well, some people have names like Heinrich von Jungingen and "von" is always lowercase. – Migol Mar 26 '14 at 10:01
  • @Jerry, I guess I want to validate 99.99% of the names. The .01% with a number in their name will have to spell out the number :) – at. Mar 26 '14 at 10:02
  • 2
    Haha, some poor fellow is going to be told "incorrect name" by your software :) – Lodewijk Bogaards Mar 26 '14 at 18:47
  • @mrhobo :), the error message I give back will indicate we only accept names to be of that format. – at. Mar 26 '14 at 21:20
  • So, sorry, but your name is of an incorrect format ;-) – Lodewijk Bogaards Mar 27 '14 at 19:46
  • 6
    Validating names, with regex or not, is [a horrible idea](http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/)! – Biffen Mar 28 '14 at 12:22
  • 1
    I'm dealing with kids, some who can barely read. I want them to be told how to enter their names better than "samtha jones ". It's a different use case than an international all-encompassing name registry. – at. Mar 29 '14 at 07:13
  • For a non-international use case with names like 'Samantha Jones", I'd split this into more than one entry. If you ask for their *first name*, check for an initial cap and upcase if not, check it against your regex for rogue chars, **then** ask for user confirmation "So, your first name is xxxx. right?", and when confirmed by the user do the same for the last name and check/confirm it, these two steps will aid the user with feedback and yield better quality input. – Dave Everitt Mar 29 '14 at 11:00
  • 4
    A regex is poor substitute for an educator... @Dave Everitt: And non-international doesn't exist any more. People aren't trees, they move, become expats. – Chris Wesseling Mar 31 '14 at 17:55
  • 1
    Wouldn't it be easier and much less prone to error to simply use a database containing all of the children's names, spelled/capitalized correctly, and then just validate their input against that? Expecting a certain pattern would save you the monotony of entering in all the names individually, but I imagine it would be worth it just to avoid the possible exceptions. – CAustin Apr 03 '14 at 21:36
  • Here are some regexs for many types of names http://stackoverflow.com/questions/275160/regex-for-names. – hunterboerner Apr 07 '14 at 16:25
  • I get your motivation for this application, but if you'll at any point need to separate first, middle, last, you are doomed to bugs due to the spaces-in-names possibility. This is why the rest of the world provides separate fields. Perhaps it's better for the kids to learn how to fill in separate fields as part of what you're doing. – Gene Apr 09 '14 at 12:22
  • @Gene - I do have separate fields, but I also allow spaces between words of a name *within* a first, middle or last name. – at. Apr 11 '14 at 17:13

2 Answers2

8

A good start would be to allow letters, marks, punctiation and whitespace. To allow for a given name like "María-Jose" and a last name like "van Rossum" (note the whitespace). So that boils down to something like:

[\p{Letter}\p{Mark}\p{Punctuation}\p{Separator}]+

If you want to restrict that a bit you could have a look at classes like \p{Lowercase_Letter}, \p{Uppercase_Letter}, \p{Titlecase_Letter}, but there may be scripts that don't have casing. \p{Space_Separator} and \p{Dash_Punctuation} can narrow it down to names that I know. But names I don't...I don't know...

But before you start constructing your regex for "validating" a name. Please read this excellent piece on names by W3C. It will shake even your concepts of first, middle and last names.

For example:

In some cultures you are given a name (Björk, Osama) and an indication of who your father (or mother) was (Guðmundsdóttir, bin Mohammed). So the "first name" could be "Björk" but:

Björk wouldn’t normally expect to be called Ms. Guðmundsdóttir. Telephone directories in Iceland are sorted by given name.

But in other cultures, the first name is not given, but a family name. In "Zhāng Mànyù", "Zhāng" is the family name. And how to address her, would depend how well you know her, but again "Ms. Zhāng" would be strange.

The list of examples goes on and ends in a some 30+ links to Wikipedia for more examples.

The article does end with suggestions for field design and some pointers on what characters to allow:

Don't forget to allow people to use punctuation such as hyphens, apostrophes, etc. in names. Don't require names to be entered all in upper case – this can be difficult on a mobile device. Allow the user to enter a name with spaces , eg. to support prefixes and suffixes such as de in French, von in German, and Jnr/Jr in American names, and also because some people consider a space-separated sequence of characters to be a single name, eg. Rose Marie.

Chris Wesseling
  • 6,226
  • 2
  • 36
  • 72
0

To answer your question about capital letters: in many areas of the world, names do not necessarily start with a capital letter. In Dutch for instance, you have surnames like "van der Vliet" where words like "van", "de", "den" and "der" are not capitalised. Additionally, you have special cases like "De fauw" and "Van pellicom" where an administrative error never got rectified, and the correct capitalisation is fairly illogical. Please do not make the mistake of rejecting such names.

I also know about town names in South Africa such as eThekwini, where the capital letter is not necessarily the first letter of the word. This could very well appear in surnames or given names as well.

Lee White
  • 3,649
  • 8
  • 37
  • 62
  • I accounted for capital letters in the middle or end of a word of a name in my question. – at. Apr 09 '14 at 18:24