1

I want to validate a prename and surname for users to register at a worldwide social community.

I want to allow all these language-related special characters like ÄÖÜÀýéè and a maximum of 3 prenames, all seperated with a space (if there are more than 1 prename). Also no underscores should be allowed or other special chars, but minus (-) should be allowed.

That's all I can think of what should be allowed in a name. I don't know how it is in other countries than Germany, but here you can have up to 3 prenames and a surname can also have 2 words like "von Seidenfeld". Maybe you can also give me some more suggestions here, because I want to cover this name-validation-system for every full name around the world.

I also know I've to use RegEx, but I don't know how to allow only these conditions that I just wrote.

What I have so far:

if (!preg_match('/^(\pL+\s+)$/u', $value)) echo 'error';

or view here: http://regex101.com/r/tE0uQ5 (sadly doesn't work, no matches)

AlexioVay
  • 4,338
  • 2
  • 31
  • 49
  • 8
    Short answer: You probably don't want to go down this rabbit hole. Long answer: http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ – David Oct 25 '13 at 18:31
  • @David Oh, well. So what do you suggest? How does Facebook do it? – AlexioVay Oct 25 '13 at 18:35
  • 1
    Does Facebook do it? What exactly does Facebook do? When I first signed up for Facebook years ago, my "name" was "Turbocharged Monkeybrain." Worked fine. – David Oct 25 '13 at 18:39
  • @David By standard naming conventions, that's technically a completely valid first and last name. Also, love the link. I posted the same in mine having not yet seen yours. – shortstuffsushi Oct 25 '13 at 18:40
  • Hyphenated forenames exist and hyphenated surnames are quite common – Quentin Oct 25 '13 at 19:04
  • 2
    And I know someone from college whose English given name is `Z` – Izkata Oct 25 '13 at 19:05

3 Answers3

4

That's all I can think of what should be allowed in a name.

Not even close.

I don't know how it is in other countries than Germany, but here you can have up to 3 prenames and a surname can also have 2 words like "von Seidenfeld".

It's different in other countries. It's probably also different in Germany, just not for your name or the people's names that you're immediately thinking of right this moment.

I want to cover this name-validation-system for every full name around the world.

This should be required reading on the subject. But to summarize... There is no way to do this. None. If you absolutely need a regular expression that will match a name, this might help.

By definition, anything that somebody enters into your system as their name is their name, at least as far as your system is concerned. It's how they've decided you should identify them. Is it their legal name in their jurisdiction? In your jurisdiction? Is it the name their parents call them? The name their friends call them? None of these things can be validated by a regular expression.

In short... Just accept any non-empty input as a name.

Community
  • 1
  • 1
David
  • 208,112
  • 36
  • 198
  • 279
  • Yeah, I just read the link you gave me. Haven't thought of that. But what about special characters and numbers? – AlexioVay Oct 25 '13 at 18:40
  • Wanted to give +10 but SO allows only 1 – anubhava Oct 25 '13 at 18:40
  • @Vay: Characters are "special" in a purely subjective sense. Just because a character isn't part of a Romanic or Germanic alphabet doesn't make it "special." Seriously, there's no other way to describe this. It's been attempted countless times by countless developers and product owners. The only way to solve the problem is to not see it as a problem in the first place. – David Oct 25 '13 at 18:42
  • 3
    @Vay The exclamation point `!` is read as click in some African languages, and is valid in the middle of a name. So no, even punctuation cannot be assumed. – Izkata Oct 25 '13 at 19:08
  • Do you guys know anything about capitalizing the name? So that the first letter would be uppercase. Should that affect any language? – AlexioVay Oct 25 '13 at 21:20
  • @Vay: That, like all others, is not a safe assumption. Are there any Arabic names which begin with "al"? German names which traditionally begin with "von"? The list goes on and on. – David Oct 25 '13 at 22:02
2

Forget name validation if it's international. Have an example, this is a totally valid Hungarian name:

dr. Lakatosné Dr. Bíró-Kis Imola

Imola is the given name, the rest is family name just to mix up things a bit more. Good luck validating that against anything. If you want more headache-inducing examples, take the artists, who are for example legally allowed to register an artist name, which becomes part of their official name.

Generally speaking, names have different origins, some cultures for example had the habit of always adding the father's name as well as appending an own name. The longest name I have heard of in my family is 7 (!) given names, not to mention various prefixes and honorary titles, etc. that some people insist on using everywhere they can and which is in fact a part of their official name (appears on their ID card). To name a famous example for a really long name, check out Rudolph Valentino, who had a total of 9 names.

If you want to build a truly international system, you'll create two fields, family and given name, and let the user enter whatever they want. When displaying, be sure to take the name order of the person viewing the site into account. (Hungarian is switched for example.) Also, take great care to require input in only one of these fields because some people may not have a family name.

Janos Pasztor
  • 1,265
  • 8
  • 16
  • Thank you for that interesting information and tips! – AlexioVay Oct 25 '13 at 19:16
  • Wondering about the '.' Is that an abbreviation/title like Dr./Mr. in English? – sdanzig Oct 25 '13 at 19:44
  • Yes, dr. and Dr. are doctor titles (upper and lowercase matters). However, dr. Lakatosné is actually the name of the person's husband, so the dr. title in front comes with the marriage, whereas the second dr. comes from the person's own title. – Janos Pasztor Oct 26 '13 at 18:47
0

I'd suggest looking for invalid characters rather than valid ones, to simplify things:

^((?:(?:(?<!^)[ ])?[^*&@_#$\]\\\^\d ]+){1,3})$

I had to escape the 4 special characters at the end (], \, ^, -). Also added \d to filter out numbers.

See it in action here! http://regex101.com/r/vY9wW5

I'm betting no one cares anymore, but I got this puppy working. The problem was me realizing how PHP strings should be escaped (with single quotes, just the escaped backslashes needed to be doubled up) and, the regex testers and maybe online ides were messing with my string in subtle ways.

<?php
$sourcestring="Uwe\nUwe-Jens\nGünther Kalle\nKalle Jens Chantal";
$regex = '((?:(?:(?<!^)[ ]|(?<=(?<=^|\n)))[^*&@_#$\]\\\\\^\d \n]+){1,3}(?=\n|$))';
preg_match_all($regex,$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>

That said, I think there are cases for names with weird symbols, such as numbers and symbols that are rare enough where you can make them work around a reality-check filter, in exchange for being able to filter out the button smashers from your login process.

sdanzig
  • 4,510
  • 1
  • 23
  • 27
  • I get an error here with your code: http://regex101.com/r/yG3cE9 – AlexioVay Oct 25 '13 at 18:51
  • Fixed to work with that tester.. although it's not perfect yet, it's enough to validate your list of names. – sdanzig Oct 25 '13 at 18:56
  • Works now with the tester, but not in my code. I tried: `if (preg_match('^([^&@*#_$\]\\\^\d]+( [^*&@_#$\]\\\^\d]+){0,2})$/u',$value))`. I get: Warning: preg_match(): Unknown modifier '&' – AlexioVay Oct 25 '13 at 19:06
  • Sorry, don't know much about RegEx – AlexioVay Oct 25 '13 at 19:10
  • Your code (now) at the bottom doesn't seem to work in my code. I can type in names with @ # etc. – AlexioVay Oct 25 '13 at 19:14
  • How do you know, some countries won't allow special characters in names? – Janos Pasztor Oct 25 '13 at 19:22
  • You don't. But I don't buy into it FULLY. I think it's reasonable to say names with numbers and underscores and dollar signs are rare enough in valid names that you can let those people call into customer service. Being able to filter out the guys who button smash their way through the login process is worth it, and I think this is a reasonable approach. However, what's not reasonable is PHP. I can work fine with a regex tester or Java, but I have NFC why my expression isn't working with preg_match :/ – sdanzig Oct 25 '13 at 19:37
  • @sdanzig Thinking the same as you. Sadly it doesn't work... – AlexioVay Oct 25 '13 at 21:04
  • @sdanzig preg actually uses the PCRE library, so please don't blame PHP. – Janos Pasztor Oct 26 '13 at 18:49
  • @Janoszen Will post it as another Stack Overflow question, and hopefully report back with an explanation. – sdanzig Oct 26 '13 at 19:10
  • Got it working! See updated answer, if anyone cares. – sdanzig Oct 26 '13 at 21:06