13

I have a regular expression:

return Regex.IsMatch(_customer.FirstName, @"^[A-Za-z][A-Za-z0-9@#%&\'\-\s\.\,*]*$");

Now, some of the customers have a fada over a vowel in their surname or firstname like the following: Brendán

Note the fada over the a which you can get by holding down alt, ctrl and then pressing a.

I have tried adding these characters into the regular expression but I get an error when the program tries to compile.

The only way I can allow the user to enter such a character with a a fada is to remove the regular expression completely which means the user can enter anything they want.

Is there any way to use the above expression and somehow allow the following characters?

á
é
í
ó
ú
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Kev
  • 743
  • 2
  • 14
  • 32

5 Answers5

22

Just for reference you don't need to escape the above ',. in your character class [], and you can avoid having to escape the dash - by placing it at the beginning or end of your character class.

You can use \p{L} which matches any kind of letter from any language. See the example below:

string[] names = { "Brendán", "Jóhn", "Jason" };
Regex rgx      = new Regex(@"^\p{L}+$");
foreach (string name in names)
    Console.WriteLine("{0} {1} a valid name.", name, rgx.IsMatch(name) ? "is" : "is not");

// Brendán is a valid name.
// Jóhn is a valid name.
// Jason is a valid name.

Or simply just add the desired characters to your character class [] you want to include.

@"^[a-zA-Z0-9áéíóú@#%&',.\s-]+$"
hwnd
  • 69,796
  • 4
  • 95
  • 132
9

Try incorporating \p{L} which will match a unicode "letter". So a and á should match against \p{L}.

Zoe
  • 27,060
  • 21
  • 118
  • 148
AFrieze
  • 844
  • 1
  • 10
  • 26
5

To expand your regular expression to include vowels with an acute accent (fada), you can use Unicode code points. You need to know about these unicode blocks:

More Unicode code charts at http://www.unicode.org/charts/index.html#scripts, covering Latin Extended-B, -C and -D and Latin Extended-Addional (which ought to cover pretty much every European language in its entirety).

So, we see that the Irish fada vowels are

  • Á is \u00C1; á is \u00E1
  • É is \u00C9; é is \u00E9
  • Í is \u00CD; í is \u00ED
  • Ó is \u00D3; ó is \u00F3
  • Ú is \u00DA; ú is \u00FA

And thus your regular expression need to be extended:

Regex rx = new Regex( @"^[A-Za-z\u00C1\u00C9\u00CD\u00D3\u00DA\u00E1\u00E9\u00ED\u00F3\u00FA][A-Za-z\u00C1\u00C9\u00CD\u00D3\u00DA\u00E1\u00E9\u00ED\u00F3\u00FA0-9@#%&\'\-\s\.\,*]*$");
Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
1

\w (word characters) includes unicode characters.

So your expression could be:

@"^\w[\w0-9@#%&\'\-\s\.\,*]*$"

(Replacing A-Za-z with \w)

driis
  • 161,458
  • 45
  • 265
  • 341
  • 1
    I thought the same thing, but it doesn't actually work as I expected either. http://regex101.com/r/pG5kS5 – Mike Perrenoud Dec 17 '13 at 18:05
  • The problem with the word character class (`\w`) is that it matches a lot of stuff: Unicode letters — categories `Ll` (lower-case), `Lu` (upper-case), `Lt` (title case), `Lo` (letter, other), `Lm` (letter, modifier), `Nd` (number, decimal digit...which includes more than just ASCII 0-9) and `Pc` (punctuation, connector). – Nicholas Carey Dec 17 '13 at 18:06
  • @MikePerrenoud There's no guarantee that PHP's regex library matches the behavior of C#'s, even if they're both PCRE. You can see from that link that the Python regex engine matches differently. – jpaugh Jul 14 '20 at 21:48
0

Try like below. It will help you...

return Regex.IsMatch(_customer.FirstName, @"^[0-9A-Za-z@#%&\'\-\s\.\,ñáéíóúü]+$");
Pandian
  • 8,848
  • 2
  • 23
  • 33