6

I'd like a Regular Expression for C# that matches "Johnson", "Del Sol", or "Del La Range"; in other words, it should match words with spaces in the middle but no space at the start or at the end.

Chad Birch
  • 73,098
  • 23
  • 151
  • 149
Caveatrob
  • 12,667
  • 32
  • 107
  • 187
  • 1
    What does the input string look like? Is the last name the only part of the string, or is it a sentence, or possibly a full name with optionally more spaces? I think context is important here. – Rich Mar 10 '09 at 21:37

7 Answers7

5
^\p{L}+(\s+\p{L}+)*$

This regex has the following features:

  • Will match a one letter last name (e.g. Malcolm X's last name)
  • Will not match last names containing numbers (like anything with a \w or a [^ ] will)
  • Matches unicode letters

But what about last names like "O'Connor" or hyphenated last names ... hmm ...

Daniel LeCheminant
  • 50,583
  • 16
  • 120
  • 115
3

This should do the job:

^[a-zA-Z][a-zA-Z ]*[a-zA-Z]$

Edit: Here's a slight improvement that allows one-latter names and hyphens/apostrophes in the name:

^[a-zA-Z'][a-zA-Z'- ]*[a-zA-Z']?$
Noldorin
  • 144,213
  • 56
  • 264
  • 302
  • Malcolm X would not be happy about this... (requiring minimum of 2 letter last names that is...) – Daniel LeCheminant Mar 10 '09 at 21:42
  • The shortest REAL name I can think of is "Ng." Should be fine. ;) – Samantha Branham Mar 10 '09 at 21:48
  • Yeah, I noticed that upon review, but didn't bother changing because I didn't consider a one-letter last name... Post is edited now anyway with a few other improvements. – Noldorin Mar 10 '09 at 23:42
  • +1 for tackling ' and -. (I don't know if the first character needs to accept an apostrophe though... or if a-- should be a valid last name) – Daniel LeCheminant Mar 11 '09 at 00:00
  • @Daniel: Cheers. And yeah, it *probably* doesn't need to accept ' as the first char, but can't hurt. Note that it shouldn't accept a hyphen as the last char, so a-b would be valid but not a-- (unless one of my quantifiers is wrong). – Noldorin Mar 11 '09 at 00:07
  • How would I change this to only allow single spaces inside the name, not more than one space? – Caveatrob Mar 11 '09 at 20:53
  • I take it you mean not more than one space in a row? Try the following (it may not quite work, as I haven't tested): ^[a-zA-Z'](([a-zA-Z])+['- ]?)*[a-zA-Z']?$ – Noldorin Mar 11 '09 at 22:04
3

In the name "Ṣalāḥ ad-Dīn Yūsuf ibn Ayyūb" (see http://en.wikipedia.org/wiki/Saladdin), which is the first name, and which is the last? What about in the name "Roberto Garcia y Vega" (invented)? "Chiang Kai-shek" (see http://en.wikipedia.org/wiki/Chang_Kai-shek)?

Spaces in names are the least of your problems! See Personal names in a global application: What to store.

Community
  • 1
  • 1
John Saunders
  • 160,644
  • 26
  • 247
  • 397
  • I agree. No matter how hard you try you will always find names that don't match correctly. I mean, if you don't have complete control on what names you are parsing. – Sergio Acosta Mar 10 '09 at 22:36
0

Here's a better one:

/^[a-zA-Z]+(([\'\,\.\- ][a-zA-Z ])?[a-zA-Z]*)*$/

Allows standard punctuation and spaces, but cannot start with punctuation.

Jason
  • 7,612
  • 14
  • 77
  • 127
0

The ? qualifier is your friend. Makes a shortest-possible match instead of a greedy one. Use it for the first name, as in:

^(.+?) (.+)$

Group 1 grabs everything up to the first space, group 2 gets the rest.

Of course, now what do you do if the first name contains spaces?

Paul Roub
  • 36,322
  • 27
  • 84
  • 93
  • Nice and simple, but I think it will match "238 39592" as well, which aren't words. – Samantha Branham Mar 10 '09 at 21:26
  • then replace "." with "\w" or "[a-zA-Z]" – Rich Mar 10 '09 at 21:34
  • Not sure if the OP wants to match the last name by itself or within a string containing both the first and last names... I supposed the former, while you seem to have done the latter. Still, it appears your regex allows spaces at the start or end, which needs to be fixed. – Noldorin Mar 10 '09 at 21:40
0

Try something like this:

^[^\s][\w\s]*[^\s]$
Andrew Hare
  • 344,730
  • 71
  • 640
  • 635
-1

I think this is more what you were looking for:

^[^ ][a-zA-Z ]+[^ ]$

This should match the beginning of the line with no space, alpha characters or a space, and no space at the end.

This works in irb, but last time I worked with C#, I've used similar regexes:

(zero is good, nil means failed)

>> "Di Giorno" =~ /^[^ ][a-zA-Z ]+[^ ]$/
=> 0
>> "DiGiorno" =~ /^[^ ][a-zA-Z ]+[^ ]$/
=> 0
>> " DiGiorno" =~ /^[^ ][a-zA-Z ]+[^ ]$/
=> nil
>> "DiGiorno " =~ /^[^ ][a-zA-Z ]+[^ ]$/
=> nil
>> "Di Gior no" =~ /^[^ ][a-zA-Z ]+[^ ]$/
=> 0
dexedrine
  • 169
  • 2