13

I'd like to restrict my form input from entering non-english characters. For example, all Chinese, Japanese, Cyrllic, but also single characters like: à, â, ù, û, ü, ô, î, ê. Would this be possible? Do I have to set up a locale on my MVC application or rather just do a regex textbox validation? Just a side note, I want to be able to enter numbers and other characters. I only want this to exclude letters.

Please advice, thank you

bobek
  • 8,003
  • 8
  • 39
  • 75

5 Answers5

11

For this you have to use Unicode character properties and blocks. Each Unicode code points has assigned some properties, e.g. this point is a Letter. Blocks are code point ranges.

For more details, see:

Those Unicode Properties and blocks are written \p{Name}, where "Name" is the name of the property or block.

When it is an uppercase "P" like this \P{Name}, then it is the negation of the property/block, i.e. it matches anything else.

There are e.g. some properties (only a short excerpt):

  • L ==> All letter characters.
  • Lu ==> Letter, Uppercase
  • Ll ==> Letter, Lowercase
  • N ==> All numbers. This includes the Nd, Nl, and No categories.
  • Pc ==> Punctuation, Connector
  • P ==> All punctuation characters. This includes the Pc, Pd, Ps, Pe, Pi, Pf, and Po categories.
  • Sm ==> Symbol, Math

There are e.g. some blocks (only a short excerpt):

  • 0000 - 007F ==> IsBasicLatin
  • 0400 - 04FF ==> IsCyrillic
  • 1000 - 109F ==> IsMyanmar

What I used in the solution:

\P{L} is a character property that is matching any character that is not a letter ("L" for Letter)

\p{IsBasicLatin} is a Unicode block that matches the code points 0000 - 007F

So your regex would be:

^[\P{L}\p{IsBasicLatin}]+$

In plain words:

This matches a string from the start to the end (^ and $), When there are (at least one) only non letters or characters from the ASCII table (doce points 0000 - 007F)

A short c# test method:

string[] myStrings = { "Foobar",
    "Foo@bar!\"§$%&/()",
    "Föobar",
    "fóÓè"
};

Regex reg = new Regex(@"^[\P{L}\p{IsBasicLatin}]+$");

foreach (string str in myStrings) {
    Match result = reg.Match(str);
    if (result.Success)
        Console.Out.WriteLine("matched ==> " + str);
    else
        Console.Out.WriteLine("failed ==> " + str);
}

Console.ReadLine();

Prints:

matched ==> Foobar
matched ==> Foo@bar!\"§$%&/()
failed ==> Föobar
failed ==> fóÓè

stema
  • 90,351
  • 20
  • 107
  • 135
  • Wouldn't `[\P{L}\p{IsBasicLatin}]` match non-english non-letters? For example other kinds of digits like ٠١٢٣٤? I don't think that was desired. Seems like he just want's to match basic ASCII characters. – Qtax Mar 14 '13 at 15:35
  • Yes, of course. That is what I understood, just exclude non ASCII letters (and match all other Unicode chars). If this understanding is wrong, the solution is very simply and already here with the accepted answer, but this is very basic regex knowledge and would not justify a bounty. – stema Mar 14 '13 at 18:15
1

You can use a Regular Expression attribute on your ViewModel to restrict that

public class MyViewModel
{
    [System.ComponentModel.DataAnnotations.RegularExpression("[a-zA-Z]+")]
    public string MyEntry
    {
       get;
       set;
    }
}
codingbiz
  • 26,179
  • 8
  • 59
  • 96
  • This matches a-z and A-Z how about 0-9 and other characters: /.,;'[]-= and so on. – bobek Mar 11 '13 at 16:33
  • I added an answer, is it going into the direction you expect from a "*A detailed canonical answer*" – stema Mar 13 '13 at 07:42
1

You can use regex [\x00-\x80]+ or [\u0000-\u0080]+. Haven't tested but think it should work in C# also.

Adapted from: Regular expression to match non-English characters?

You can use regex validation for textbox and validate on the server also.

Community
  • 1
  • 1
publicgk
  • 3,170
  • 1
  • 26
  • 45
1

May be this one help You:=

private void Validate(TextBox textBox1)
{
 Regex rx = new Regex("[^A-Z|^a-z|^ |^\t]");
 if (rx.IsMatch(textBoxControl.Text))
  throw new Exception("Your error message");
}

Usefull Link:-

http://social.msdn.microsoft.com/Forums/en-US/csharpgeneral/thread/84e4f7fa-5fff-427f-8c0e-d478cb38fa12

http://www.c-sharpcorner.com/Forums/Thread/177046/allow-only-20-alphabets-and-numbers-in-textbox-using-reg.aspx

0

this might help, not efficient way but simple non-reg validation

foreach (char c in inputTextField)
{
       if ((int)(c) > 127)
          {
             // expection or your logic whatever you want to return
          }

 }
warrior
  • 606
  • 2
  • 8
  • 23