Detect chinese character using perl?

Question

Is there any way to detect Chinese characters using Perl? And is there any way on how to split Chinese characters with symbol dot '.' perfectly?

http://stackoverflow.com/questions/1951613/how-to-match-chinese-character-using-perls-regex — dnatoli, Aug 04 '11 at 06:47

daxim · Answer 1 · 2011-08-04T10:37:27.330

Depends on your particular notion of what is a Chinese character. Perhaps you're looking for /\p{Script=Hani}/, but if we want to cast our net wide, the following regex pattern will match stuff that occurs in Chinese writing. Restrict if necessary.

use 5.014;
/
    (?: \p{Block=CJK_Compatibility}
    |   \p{Block=CJK_Compatibility_Forms}
    |   \p{Block=CJK_Compatibility_Ideographs}
    |   \p{Block=CJK_Compatibility_Ideographs_Supplement}
    |   \p{Block=CJK_Radicals_Supplement}
    |   \p{Block=CJK_Strokes}
    |   \p{Block=CJK_Symbols_And_Punctuation}
    |   \p{Block=CJK_Unified_Ideographs}
    |   \p{Block=CJK_Unified_Ideographs_Extension_A}
    |   \p{Block=CJK_Unified_Ideographs_Extension_B}
    |   \p{Block=CJK_Unified_Ideographs_Extension_C}
    )
/x;

Yes, . matches one character. The empty pattern for split DWYM:

use utf8;
split //, '冰淇淋'
# returns ('冰', '淇', '淋')

How about if 冰.淇. ,but I just want to split out the last dot not all dot in the whole words? — deepWebMie, Aug 05 '11 at 03:17
PerlDoc page on this technique: http://perldoc.perl.org/perluniprops.html#Properties-accessible-through-%5Cp%7B%7D-and-%5CP%7B%7D — jhclark, Jun 29 '12 at 18:18

Detect chinese character using perl?

1 Answers1

Linked

Related