31

I have the string $str and I want to check if it`s content has Chinese chars or not (true/false)

$str = "赕就可消垻,只有当所有方块都被消垻时才可以过关";

can you please help me?

Thanks! Adrian

Black
  • 18,150
  • 39
  • 158
  • 271
Adrian
  • 997
  • 2
  • 9
  • 13

4 Answers4

70

You could use a unicode character class http://www.regular-expressions.info/unicode.html

preg_match("/\p{Han}+/u", $utf8_str);

This just checks for the presence of at least one chinese character. You might want to expand on this if you want to match the complete string.

mario
  • 144,265
  • 20
  • 237
  • 291
  • Great answer. Didn't know you could identify unicode via regex! – Peter Feb 07 '11 at 16:01
  • @Peter: It's a bit recent. Depends on the version, but a Unicode-compatible `PCRE_VERSION` should be bundled since PHP4. – mario Feb 07 '11 at 16:03
  • IIRC, this also depends on the PCRE library on the server having Unicode handling enabled. But it should be present on most modern servers. – Pekka Feb 07 '11 at 16:49
  • @Pekka: Ah nice, instant upgrade!, it's indeed a shared library. (Probably depends on the build parameters. PHP 5.3.3 still bundles an outdated libpcre version 7.8 from 2008.) – mario Feb 07 '11 at 16:57
  • This answer is terrific. Chapeau. – Jamie Hollern Sep 19 '17 at 09:37
  • Tried it with this string and it doesn't catch it: 人中之龍,Yakuza 3,SEGA,PS4,重製版,《人中之龍 3》PS4 重製版 釋出全新預告及遊戲截圖,影音相關,Game LIFE 遊戲情報 At least some of these characters appear to be Chinese according to the online dictionaries – alimack Aug 24 '18 at 10:43
3

@mario answer is right!

For Chinese chars use this regex: /[\x{4e00}-\x{9fa5}]+/u

And Don't forget the u modifier!!!

About u modifier reference

TKS to mario

Darshan Lila
  • 5,772
  • 2
  • 24
  • 34
eaglewu
  • 458
  • 5
  • 12
2
preg_match("/^\p{Han}{2,10}+$/u", $str);

Use /^\p{Han}{2,10}+$/u regex which allows Chinese character only.

  1. It allows chinese character only &
  2. It allows Minimum 2 character &
  3. It allows maximum 10 character

You can change minimum and maximum character by changing {2,10} as per your need.

\p & /u are very important to add please don't avoid to add it.

Newton Singh
  • 332
  • 3
  • 9
1

This link to a previous question on identifying simplified or traditional Chinese might give you some ideas... you don't actually specify which you mean, and I don't know Chinese well enough to recognise the difference

Community
  • 1
  • 1
Mark Baker
  • 209,507
  • 32
  • 346
  • 385
  • Hey, this is a great idea and has less dependencies than an Unicode Regex. +1 – Pekka Feb 07 '11 at 16:49
  • @Pekka - I have to confess, I was surprised it actually worked (even if it did need a bit of help from bobince with the actual charsets).... just one of those theories that I'd never had a chance to try in practise. – Mark Baker Feb 07 '11 at 16:54