7

I am currently writing some validation that will validate inputted data. I am using regular expressions to do so, working with C#.

Password = @"(?!^[0-9]*$)(?!^[a-zA-Z]*$)^([a-zA-Z0-9]{6,18})$"

Validate Alpha Numeric = [^a-zA-Z0-9ñÑáÁéÉíÍóÓúÚüÜ¡¿{0}]

The above work fine on the latin alphabet, but how can I expand such to working with the Cyrillic alphabet?

amateur
  • 43,371
  • 65
  • 192
  • 320

3 Answers3

11

The basic approach to covering ranges of characters using regular expressions is to construct an expression of the form [A-Za-z], where A is the first letter of the range, and Z is the last letter of the range.

The problem is, there is no such thing as "The" Cyrillic alphabet: the alphabet is slightly different depending on the language. If you would like to cover Russian version of the Cyrillic, use [А-Яа-я]. You would use a different range, say, for Serbian, because the last letter in their Cyrillic is Ш, not Я.

Another approach is to list all characters one-by-one. Simply find an authoritative reference for the alphabet that you want to put in a regexp, and put all characters for it into a pair of square brackets:

[АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдеёжзийклмнопрстуфхцчшщъыьэюя]
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • +1. Good point on "no Cyrillic alphabet" - there are Cyrillic characters (@"\p{IsCyrillic}+") but if one need to limit to a given language explicit enumeration is the way to go. – Alexei Levenkov Feb 16 '13 at 02:44
  • Thanks for this - how would I add this to the regular expressions that I provided above? – amateur Feb 16 '13 at 17:03
  • @amateur Just like this - `[^a-zA-ZА-Яа-я0-9ñÑáÁéÉíÍóÓúÚüÜ¡¿{0}]` – Sergey Kalinichenko Feb 16 '13 at 17:21
  • @dasblinkenlight the problem here is that you allowed some set of latin an cyrillic but then again don't support greek, hebrew, arabic, japanese, chinese, korean etc.. So I'd prefer Alexei Levenkov's solution if you don't need only specific characters but think about using your code worldwide. – ecth Aug 11 '16 at 08:25
9

You can use character classes if you need to allow characters of particular language or particular type:

@"\p{IsCyrillic}+" // Cyrillic letters
@"[\p{Ll}\p{Lt}]+" // any upper/lower case letters in any language

In your case maybe "not a whitespace" would be enough: @"[^\s]+" or maybe "word character (which includes numbers and underscores) - @"\w+".

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
  • +1 It's nice to know that there are a convenient character classes for detecting various native alphabets. – Sergey Kalinichenko Feb 16 '13 at 02:54
  • `[\p{Ll}\p{Lt}]` I think some character might be missing, but I don't know the exact different between "title case" and "upper case"... http://msdn.microsoft.com/en-us/library/20bw873z.aspx#SupportedUnicodeGeneralCategories – nhahtdh Feb 16 '13 at 07:16
  • 1
    Just a random note: `\p{IsCyrillic}` mean [Cyrillic **block**](http://msdn.microsoft.com/en-us/library/20bw873z.aspx#SupportedNamedBlocks) in C#, but it will mean [Cyrillic **script**](http://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode) (containing many blocks) in Java. – nhahtdh Feb 16 '13 at 07:25
1
Password = @"(?!^[0-9]*$)(?!^[А-Яа-я]*$)^([А-Яа-я0-9]{6,18})$"

Validate Alpha Numeric = [^а-яА-Я0-9ñÑáÁéÉíÍóÓúÚüÜ¡¿{0}]
KJW
  • 15,035
  • 47
  • 137
  • 243