Regex for Chinese numerals in java

Asked Feb 19 '14 at 16:57

Active Oct 07 '14 at 09:48

Viewed 327 times

0

The regular expression \p{N} is not recognising Chinese numerals.

Please suggest correct regex in Java for this.

edited Oct 07 '14 at 09:48

nhahtdh

55,989
15
126
162

asked Feb 19 '14 at 16:57

abhishek walia

1

possible duplicate of [Use regular expression to match ANY Chinese character in utf-8 encoding](http://stackoverflow.com/questions/9576384/use-regular-expression-to-match-any-chinese-character-in-utf-8-encoding) – Eel Lee Feb 19 '14 at 17:24

1 Answers1

2

My answer is based on this article on Wikipedia Chinese numerals:

Common numerals: 0 to 10, hundred, thousand, ten thousand, hundred million

零〇一二三四五六七八九十百千: \u96f6\u3007\u4e00\u4e8c\u4e09\u56db\u4e94\u516d\u4e03\u516b\u4e5d\u5341\u767e\u5343

(Simplified) 万亿: \u4e07\u4ebf
(Traditional) 萬億: \u842c\u5104
Financial use

(Simplified) 零壹贰叁肆伍陆柒捌玖拾佰仟萬億: \u96f6\u58f9\u8d30\u53c1\u8086\u4f0d\u9646\u67d2\u634c\u7396\u62fe\u4f70\u4edf\u842c\u5104
(Traditional) 零壹貳參肆伍陸柒捌玖拾佰仟萬億: \u96f6\u58f9\u8cb3\u53c3\u8086\u4f0d\u9678\u67d2\u634c\u7396\u62fe\u4f70\u4edf\u842c\u5104

The 2 versions differs at 2, 3, and 6. Some of them overlap with common numerals.
Large number beyond 10¹² and up to 10⁴⁴

(Traditional) 兆京垓秭穰溝澗正載: \u5146\u4eac\u5793\u79ed\u7a70\u6e9d\u6f97\u6b63\u8f09
(Simplified) 兆京垓秭穰沟涧正载: \u5146\u4eac\u5793\u79ed\u7a70\u6c9f\u6da7\u6b63\u8f7d

The 2 versions differs at the 6th, 7th and 9th characters.

(Some other alternatives) 經经杼壤: \u7d93\u7ecf\u677c\u58e4
Regional usage

(Traditional) 兩: \u5169
(Simplified) 两: \u4e24

Of note is the character above. Others are not commonly used.

answered Feb 19 '14 at 19:03

nhahtdh

55,989
15
126
162