2

I have a description which is in English and Chinese.

How would I go about using regex to say something like, if the line contains a Chinese character then do A, else do B ?

example here

电源: 110V/220W50-60HZ
功率:60W
光源:12V 150 W
尺寸:220x150x280mm
重量:2.3KG



Voltage : 110V/220W50-60HZ
Power : 60W
Bulb : 12V 150 W
Size : 220x150x280mm
Weight:2.3KG
Makoto
  • 104,088
  • 27
  • 192
  • 230
Lee
  • 1,280
  • 4
  • 18
  • 35
  • If the encoding is Unicode, English letters are codes 0x0041 to 0x005B and 0061 to 007B and Chinese characters are codes 0x4E00 - 0x4FFF. Your regex could possibly check for character code matches. – Bailey Parker Jul 30 '11 at 12:09
  • 5
    Why not searching StackOverflow? http://stackoverflow.com/questions/1550950/detect-chinese-multibyte-character-in-the-string – Kaken Bok Jul 30 '11 at 12:09
  • Do you want to translate from English to Chinese, or differentiate between them through a regexp? I don't really get the 'disifer' part. – Jimmie Lin Jul 30 '11 at 12:31

1 Answers1

4

Chinese characters are within the range: U+4E00..U+9FFF

If your expreg extension has been built with Unicde support, b\p{InCJK_Unified_Ideographs} is a good replacement for [\x{4E00}-\x{9FFF}] (which was in the link Jens Struwe gave).

You can find most (all?) of Unicode ranges here: http://www.regular-expressions.info/unicode.html

I'm not sure what you want to achieve, but maybe a good start would be split your description by line. Then, for each line, find whether it's Chinese or not, and run the appropriate regex. ;)

Jim DeLaHunt
  • 10,960
  • 3
  • 45
  • 74
Savageman
  • 9,257
  • 6
  • 40
  • 50