0

For this PHP page I know that \P{C} matches all non-invisible control characters. And I try the pattern [\P{C}]* in this regex test site, it does match the Chinese characters.

In my PHP 5.6.30 hosted on Apache

preg_match_all('#([\P{C}]*)#', '中文', $t_matches, PREG_SET_ORDER);
var_dump($t_matches);

does not match the Chinese characters correctly. But the following code does:

preg_match_all('#([^\n]*)#', '中文', $t_matches, PREG_SET_ORDER);
var_dump($t_matches);

I know how to correctly match Chinese character from this post. I am just wondering why the [\P{C}]* failed.

Community
  • 1
  • 1
Zipher
  • 21
  • 3
  • 1
    Flags: `'#([\P{C}]*)#u'` – Mark Baker May 22 '17 at 10:04
  • 1
    When you deal with Unicode strings, use the [UNICODE modifier](http://stackoverflow.com/documentation/regex/5138/regex-modifiers-flags/18161/unicode-modifier). – Wiktor Stribiżew May 22 '17 at 10:04
  • Shouldn't it be the lower-case `p` ? – CD001 May 22 '17 at 10:04
  • @CD001: No, because the `\P{C}` matches any char but a control char. `\p{C}` won't match any letters, it only matches control chars. – Wiktor Stribiżew May 22 '17 at 10:04
  • @WiktorStribiżew Ah - gotcha, I was thinking of something like `\p{L}` where you're matching any unicode letter rather than *not* matching control characters :) – CD001 May 22 '17 at 10:05
  • Thank you for all the help and the reference! I learned and added the UNICODE modifier `u` in the pattern and the problem is solved. – Zipher May 22 '17 at 23:29

0 Answers0