1

I need to check if a string contains chinese characters. After searching i found that i have to look with the regex on this pattern \u31C0-\u31EF, But i don't manage to get the regex work.

Anyone experienced with this situation ? is the regex correct ?

herohuyongtao
  • 49,413
  • 29
  • 133
  • 174
Tazz
  • 781
  • 1
  • 8
  • 23
  • 2
    Using `"[\u31C0-\u31EF]"` will indeed match any character whose code point is in the range `0x31C0` to `0x31EF`. You need the square brackets. I have no idea whether the actual numbers are correct; there are only 48 characters in this range, and I thought CJK had a lot more than that, but what do I know? – ajb Feb 26 '14 at 17:29
  • There's definitely more characters in CJK, see [here](http://en.wikipedia.org/wiki/CJK_Unified_Ideographs). – juan.facorro Feb 26 '14 at 17:37
  • 2
    The duplicate is not marked with a java tag. Is this really a duplicate? – Suragch Jan 31 '17 at 10:48

1 Answers1

2

As discussed here, in Java 7 (i.e. regex compiler meets requirement RL1.2 Properties from UTS#18 Unicode Regular Expressions), you can use the following regex to match a Chinese (well, CJK) character:

\p{script=Han}

which can be appreviated to simply

\p{Han}
Community
  • 1
  • 1
herohuyongtao
  • 49,413
  • 29
  • 133
  • 174