0

The regular expression \p{N} is not recognising Chinese numerals.

Please suggest correct regex in Java for this.

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
  • possible duplicate of [Use regular expression to match ANY Chinese character in utf-8 encoding](http://stackoverflow.com/questions/9576384/use-regular-expression-to-match-any-chinese-character-in-utf-8-encoding) – Eel Lee Feb 19 '14 at 17:24

1 Answers1

2

My answer is based on this article on Wikipedia Chinese numerals:

  • Common numerals: 0 to 10, hundred, thousand, ten thousand, hundred million

    零〇一二三四五六七八九十百千: \u96f6\u3007\u4e00\u4e8c\u4e09\u56db\u4e94\u516d\u4e03\u516b\u4e5d\u5341\u767e\u5343

    (Simplified) 万亿: \u4e07\u4ebf
    (Traditional) 萬億: \u842c\u5104

  • Financial use

    (Simplified) 零壹贰叁肆伍陆柒捌玖拾佰仟萬億: \u96f6\u58f9\u8d30\u53c1\u8086\u4f0d\u9646\u67d2\u634c\u7396\u62fe\u4f70\u4edf\u842c\u5104
    (Traditional) 零壹貳參肆伍陸柒捌玖拾佰仟萬億: \u96f6\u58f9\u8cb3\u53c3\u8086\u4f0d\u9678\u67d2\u634c\u7396\u62fe\u4f70\u4edf\u842c\u5104

    The 2 versions differs at 2, 3, and 6. Some of them overlap with common numerals.

  • Large number beyond 1012 and up to 1044

    (Traditional) 兆京垓秭穰溝澗正載: \u5146\u4eac\u5793\u79ed\u7a70\u6e9d\u6f97\u6b63\u8f09
    (Simplified) 兆京垓秭穰沟涧正载: \u5146\u4eac\u5793\u79ed\u7a70\u6c9f\u6da7\u6b63\u8f7d

    The 2 versions differs at the 6th, 7th and 9th characters.

    (Some other alternatives) 經经杼壤: \u7d93\u7ecf\u677c\u58e4

  • Regional usage

    (Traditional) 兩: \u5169
    (Simplified) 两: \u4e24

    Of note is the character above. Others are not commonly used.

nhahtdh
  • 55,989
  • 15
  • 126
  • 162