1

My aim is to validate a last name by allowing it to only contain letters or a single quote. I do not know what the fastest way is..maybe regex I suppose.. Anyway, so far I have this:

function check_surname($surname)
{
    $c = str_split($surname,1);
    $i = 0;
    $test = 1; // Wrong surname

    while($i < strlen($surname))
    {
       if(ctype_alpha($c[$i]) or $c[$i] == '\'')
       {
           $test = 0;
           $i++;
       }
       else
       {
           return false;
       }
    }
}

I can feel that something is wrong here but I can't see where it is. Could anyone help me out?

Joel
  • 4,732
  • 9
  • 39
  • 54
  • 5
    I wouldn't recommend [this](http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/)..if you still **need** too for some reason: `return preg_match('/[a-z']/i', $surname);` – Sam Feb 01 '14 at 23:20
  • 1
    "My aim is to validate a last name by allowing it to only contain letters or a single quote." http://en.wikipedia.org/wiki/John_von_Neumann breaks this... – ceejayoz Feb 01 '14 at 23:24
  • 1
    @SamSullivan, Would that work for John le Carré? – TRiG Feb 01 '14 at 23:24
  • No @TRiG, which is why I just used a comment not an answer and linked to this article: http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ – Sam Feb 01 '14 at 23:25
  • 2
    Yup. Good article, @SamSullivan. I've linked to it before on my own blog. Thing is, if the OP does want to restrict it to names containing letters, *é* is a letter, and it's not one that will be caught by `[a-z]`. – TRiG Feb 01 '14 at 23:30
  • 1
    Never forget diacritics or you're alienating a large number of users – Cyclone Feb 01 '14 at 23:33
  • 1
    Take a look at [this post.](http://stackoverflow.com/questions/888838/regular-expression-for-validating-names-and-surnames) – Joel Feb 01 '14 at 23:35
  • 1
    Back to the basics: Why try to filter the last name at all? What's with Russian people? Chinese? Indian? I'd recommend to ensure application is working correct with Unicode – frlan Feb 01 '14 at 23:39
  • 1
    Hell, what about hyphens even? Mary Connor-O'Toole. – bishop Feb 01 '14 at 23:50

1 Answers1

3

There are some good suggestions in the comments, and I definitely agree with @Cyclone that you should take into account diacritics (accented letters).

Fortunately, PHP regexes support Unicode classes, so this is easy to do. Unicode includes a class L for any letter (uppercase, lowercase, modified, and title case). This will allow accented letters in the name.

I would also recommend that you allow for dashes (Katherine Zeta-Jones) and spaces (Guido van Rossum). Given all that, I would use the following regex:

preg_match("/^[\p{L} '-]+$/", lname);
Ethan Brown
  • 26,892
  • 4
  • 80
  • 92
  • Thank you all very much! – user3159187 Feb 02 '14 at 00:00
  • 1
    doesn't work unless you enable Unicode mode, and accepts --' -- as a good name... – Walter Tross Feb 02 '14 at 00:17
  • Yes it worked fine! I am going to refresh my poor knowledge of regex and consider all your precious pieces of advice. For the time being I am going to use your solution! Thanks very much – user3159187 Feb 02 '14 at 00:19
  • 1
    Walter, it will work whether you enable Unicode or not. If the input string is not Unicode, PHP's regex engine will treat `\p{L}` as `[a-zA-Z]`, essentially. As for pathological cases like `--'--`, there's such a thing as going _too_ far with validation. A lot of it depends on what you're trying to stop. If you're just trying to generally prevent mistakes, this is probably fine. If you have asshole users who are going to try things like "--'--" or the like, well, maybe you need something a little stronger, in which case I would use multiple regexes and maybe some heuristics on top of that – Ethan Brown Feb 02 '14 at 00:43