0

I want to return a bool to check that a string contains letters, numbers as well as spaces and dashes. The letters can be from foreign alphabets and need to accept letters such as é, à, î, or ô. This is the line of code I have:

if (Regex.IsMatch(this.UserFirstName, @"^[a-zA-Z0-9À-ž_ -]") == false)
{ return false; }

However, it returns true even if I have a #@! in the string; here on the image I switched the ^ to see if it makes a difference but it doesn't.

enter image description here

What do I need to change to get the expected result: if there are characters such as #, @, or ! in the string then return false.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
frenchie
  • 51,731
  • 109
  • 304
  • 510
  • This may give some pointers: http://stackoverflow.com/questions/15131632/allow-only-letters-and-special-letters-%C3%A9%C3%A8%C3%A0-etc-through-a-regex – Charles Mager May 14 '15 at 21:37

1 Answers1

3

The main issue with your regex is that it only checks the first character because of ^ anchor (start of string) and no quantifiers with the character class you defined (i.e. only 1 occurrence is tested). Thus, Sylvain#@! will pass as it starts with an allowed character.

You need to use \p{L} shorthand class for all Unicode letters with both ^ and $ anchors and a + quantifier:

if (Regex.IsMatch(this.UserFirstName, @"^[\p{L}\p{N}\p{Zs}_-]+$") == false)
{
  return false; 
}

See regexstorm demo

In the regex, I also replaced 0-9 with shorthand class \p{N} (Unicode class for numeric characters), and with \p{Zs} (Unicode spaces). See more details at MSDN Supported Unicode General Categories. If you plan to only allow "regular" digits and only a regular space, then just keep your 0-9 range and in the regex:

@"^[\p{L}0-9 _-]+$"
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Glad to help. Also, please note that if you plan to accept combining marks, you also need to add `\p{M}` to the character class: `@"^[\p{L}\p{M}\p{N}\p{Zs}_-]+$"` – Wiktor Stribiżew May 14 '15 at 21:49
  • Please refer to [Characters, Code Points, and Graphemes or How Unicode Makes a Mess of Things](http://www.regular-expressions.info/unicode.html): *In Unicode, `à` can be encoded as two code points: U+0061 (a) followed by U+0300 (grave accent)... The Unicode code point U+0300 (grave accent) is a combining mark.* – Wiktor Stribiżew May 14 '15 at 22:03