17

Simple problem: an existing project allows me to add additional fields (with additional checks on those fields as regular expressions) to support custom input forms. And I need to add a new form but cannot change how this project works. This form allows a visitor to enter his first and last name plus initials. So the RegEx ^[a-zA-Z.]*$ worked just fine for now.
Then someone noticed that it wouldn't accept diacritic characters as input. A Turkish name like Ömür was not accepted as valid. It needs to be accepted, though.

So I have two options:

  1. Remove the check completely, which would allow users to enter garbage.
  2. Write a regular expression that would also include diacritic letters but still no digits, spaces or other non-letters.

Since I cannot change the code of the project, I only have these two options. I would prefer option 2 but now wonder what the proper RegEx should be. (The project is written in C# 4.0.)

Wim ten Brink
  • 25,901
  • 20
  • 83
  • 149
  • What are you going to do about someone who legally changes the written form of their name to be the character sequence “42 79”? Some people do stupid stuff like that… – Donal Fellows Jan 19 '12 at 09:34
  • Well, someone named "42 79" would be entered as "Fourtytwo Zeventynine". :-) Besides, not all countries allow their citizens to be this stupid. :-) – Wim ten Brink Jan 20 '12 at 10:56
  • 1
    Leaving aside local regulation, if my name was “42 79” and someone put it in some poxy DB as “Fourtytwo Zeventynine”, I would demand that they change their DB as it would be _formally_ incorrect. More to the point, people _do_ have multi-word family names (that might or might not be easy to capitalize) and family names with apostrophes in (common in Irish surnames) and a whole host of other things. Names are tough to validate. – Donal Fellows Jan 21 '12 at 10:25

1 Answers1

29

You can use the specific Unicode escape for letters - \p{L} (this will include the A-Za-z ranges):

^[.\p{L}]*$

See on regularexpressions.info:

\p{L} or \p{Letter}

Matches a single Unicode code point that has the property "letter". See Unicode Character Properties in the tutorial for a complete list of properties. Each Unicode code point has exactly one property. Can be used inside character classes.

Community
  • 1
  • 1
Oded
  • 489,969
  • 99
  • 883
  • 1,009