Am trying to use the regex in C# to match chinese characters.
\p{Han}+
However, C# fails to run, saying Unknown property Han
Am trying to use the regex in C# to match chinese characters.
\p{Han}+
However, C# fails to run, saying Unknown property Han
Theoretically we can accomplish the requirement by Unicode Script of regular expression.
But, C# doesn't support Unicode Script (but Unicode Categories are fine.)
It'll throw ArgumentException
like this:
[System.ArgumentException: parsing "\p{Han}+" - Unknown property 'Han'.]
at System.Text.RegularExpressions.RegexCharClass.SetFromProperty(String capname, Boolean invert, String pattern)
at System.Text.RegularExpressions.RegexCharClass.AddCategoryFromName(String categoryName, Boolean invert, Boolean caseInsensitive, String pattern)
at System.Text.RegularExpressions.RegexParser.ScanBackslash()
at System.Text.RegularExpressions.RegexParser.ScanRegex()
at System.Text.RegularExpressions.RegexParser.Parse(String re, RegexOptions op)
at System.Text.RegularExpressions.Regex..ctor(String pattern, RegexOptions options, TimeSpan matchTimeout, Boolean useCache)
at System.Text.RegularExpressions.Regex..ctor(String pattern)
Detailed infos are referenced here.
In .Net, you need to prepend Is
to Unicode block properties.
I don't know what the corresponding block is for Han, or if it's supported, but you can try:
\p{IsHan}+
See MSDN for a list of supported types.
This works for other alphabets. See an example for Greek and Latin.
dotnet platform regex match chinese characters:
\p{IsCJKUnifiedIdeographs}+
This might work:
\p{L}
That would allow letters from any alphabet, if you want only Chinese character (no English ones) then I may need more time.
Also I am assuming you are using Regex correctly, test this code with \p{Han}+ to see if it still does not work.
Regex regex = new Regex(@"\p{Han}+");///the requirement.
Match match = regex.Match("YourString");
if (match.Success)
{
Console.WriteLine("MATCH VALUE: " + match.Value);
}